Series & Approximation · intermediate · 55 min read

Fourier Series & Orthogonal Expansions

From local Taylor approximation to global periodic decomposition. The trigonometric basis {1, cos nx, sin nx} is orthogonal — making Fourier coefficients computable by inner products and Fourier partial sums the best L² approximation. But convergence at discontinuities reveals the Gibbs phenomenon: a 9% overshoot that never goes away.

Abstract. A Fourier series represents a periodic function f as a trigonometric sum f(x) ~ a₀/2 + Σ(aₙcos(nx) + bₙsin(nx)), where the coefficients aₙ = (1/π)∫f(x)cos(nx)dx and bₙ = (1/π)∫f(x)sin(nx)dx are computed by integrating f against the basis functions. The trigonometric system {1, cos(nx), sin(nx)} is orthogonal under the L²[-π,π] inner product ⟨f,g⟩ = (1/π)∫f(x)g(x)dx — this orthogonality is what makes the coefficients computable as inner products and the Fourier partial sum Sₙ(x) the best L² approximation among all trigonometric polynomials of degree ≤ n. Bessel's inequality bounds the total energy of the coefficients by the energy of f; Parseval's identity holds with equality when the trigonometric system is complete. The Riemann-Lebesgue lemma guarantees aₙ,bₙ → 0, and the rate of decay is governed by the smoothness of f: C^k functions have coefficients decaying as O(1/n^k), while discontinuous functions decay only as O(1/n). Dirichlet's theorem establishes pointwise convergence to (f(x⁺) + f(x⁻))/2 for piecewise-smooth functions. At discontinuities, the Gibbs phenomenon produces an overshoot of approximately 9% that persists at every finite partial sum — a fundamental limitation of trigonometric approximation at jumps. Term-by-term integration of a Fourier series is always valid, but differentiation requires stronger regularity. In machine learning, Fourier analysis underlies positional encodings in transformers (sinusoidal basis functions for encoding sequence position), random Fourier features for kernel approximation, spectral normalization for training stability, and signal processing in convolutional architectures.

Where this leads → formalML

  • formalML Random Fourier Features approximate shift-invariant kernels k(x−y) using random trigonometric functions z(x) = √(2/D)cos(ωᵀx + b). The kernel equals the Fourier transform of its spectral measure — random features are Monte Carlo estimates of Fourier coefficients.
  • formalML The Fourier decomposition f = Σ⟨f,eₙ⟩eₙ is the prototype for the spectral theorem for compact self-adjoint operators. The trigonometric system is the eigenbasis of the differentiation operator on the circle, and Fourier coefficients are eigencoordinates.
  • formalML Score functions in exponential families decompose into orthogonal components via the Fisher information inner product. The orthogonal projection structure of Fourier analysis carries over to the geometry of statistical manifolds.
  • formalML Fourier analysis on Boolean functions {−1,1}ⁿ uses Fourier coefficients f̂(S) = E[f(x)χ_S(x)] — the discrete analog of the continuous Fourier coefficients developed here. The KKL inequality and noise sensitivity bounds are proved via Fourier-analytic methods.

1. Overview & Motivation — From Local to Global Approximation

Power Series & Taylor Series showed that Taylor series f(n)(c)/n!(xc)n\sum f^{(n)}(c)/n! \cdot (x-c)^n approximate a function near a center cc using the monomial basis {1,x,x2,}\{1, x, x^2, \ldots\}. But Taylor series are intrinsically local — the radius of convergence RR limits how far from cc the approximation reaches. And if ff has a discontinuity, the Taylor series at any interior point has R=0R = 0 at the jump — Taylor theory cannot even see the discontinuity.

What if we want to approximate a function on an entire interval? And what if the function has discontinuities — which automatically destroy Taylor convergence?

Fourier series replace the monomial basis with the trigonometric basis {1,cosx,sinx,cos2x,sin2x,}\{1, \cos x, \sin x, \cos 2x, \sin 2x, \ldots\}. Instead of matching derivatives at a single point (Taylor), a Fourier series matches the function globally by decomposing it into periodic components at different frequencies. The coefficients are computed not from derivatives at a center, but from integrals over the entire period — an inherently global operation.

Why this matters in ML. Transformer architectures encode sequence position using sinusoidal functions: PE(pos,2i)=sin(pos/100002i/dmodel)\text{PE}(\text{pos}, 2i) = \sin(\text{pos}/10000^{2i/d_{\text{model}}}). This is a Fourier basis — each dimension of the positional encoding is a trigonometric function at a different frequency. The orthogonality of the Fourier basis means each frequency carries independent information, and the smooth structure allows the model to generalize to sequence lengths unseen during training.

This topic sits at the intersection of four prerequisites. The Riemann Integral & FTC provides the integration machinery for computing Fourier coefficients. Uniform Convergence provides the framework for analyzing when Fourier series converge uniformly. Series Convergence & Tests provides the convergence tests for bounding coefficient decay rates. Power Series & Taylor Series provides the comparison — local Taylor vs. global Fourier — that motivates the entire discussion.

Fourier coefficients for three standard waveforms — square wave (odd harmonics at O(1/n)), sawtooth (all harmonics at O(1/n)), and triangle wave (odd harmonics at O(1/n²))

2. Trigonometric Polynomials & Fourier Coefficients

📐 Definition 1 (Trigonometric Polynomial)

A trigonometric polynomial of degree nn is a function of the form

Tn(x)=a02+k=1n(akcoskx+bksinkx)T_n(x) = \frac{a_0}{2} + \sum_{k=1}^{n}(a_k \cos kx + b_k \sin kx)

where a0,a1,,an,b1,,bna_0, a_1, \ldots, a_n, b_1, \ldots, b_n are real constants. The factor 12\frac{1}{2} on a0a_0 is a normalization convention that simplifies the coefficient formulas below.

📐 Definition 2 (Fourier Coefficients)

For a function ff integrable on [π,π][-\pi, \pi], the Fourier coefficients of ff are

an=1πππf(x)cos(nx)dxfor n0a_n = \frac{1}{\pi}\int_{-\pi}^{\pi} f(x)\cos(nx)\,dx \quad \text{for } n \geq 0

bn=1πππf(x)sin(nx)dxfor n1b_n = \frac{1}{\pi}\int_{-\pi}^{\pi} f(x)\sin(nx)\,dx \quad \text{for } n \geq 1

📐 Definition 3 (Fourier Series)

The Fourier series of ff is the formal trigonometric series

f(x)a02+n=1(ancosnx+bnsinnx)f(x) \sim \frac{a_0}{2} + \sum_{n=1}^{\infty}(a_n \cos nx + b_n \sin nx)

The symbol \sim (rather than ==) indicates that convergence has not yet been established — the series may or may not converge, and if it converges, it may or may not converge to ff.

💡 Remark 1 (From local to global)

Taylor coefficients f(n)(c)/n!f^{(n)}(c)/n! are computed from derivatives at a single point — a local operation. Fourier coefficients an,bna_n, b_n are computed by integrating over the entire period [π,π][-\pi, \pi] — a global operation. This is why the Fourier series can represent discontinuous functions that the Taylor series cannot: the coefficients “see” the whole function, not just the behavior at one point.

📝 Example 1 (The square wave)

Let f(x)=sgn(x)f(x) = \text{sgn}(x) for x(π,π)x \in (-\pi, \pi) — the square wave that takes value +1+1 for x>0x > 0 and 1-1 for x<0x < 0. Since ff is odd, an=0a_n = 0 for all nn. Computing bnb_n:

bn=1πππsgn(x)sin(nx)dx=2π0πsin(nx)dx=2nπ(1cosnπ)=2nπ(1(1)n)b_n = \frac{1}{\pi}\int_{-\pi}^{\pi} \text{sgn}(x)\sin(nx)\,dx = \frac{2}{\pi}\int_0^{\pi}\sin(nx)\,dx = \frac{2}{n\pi}(1-\cos n\pi) = \frac{2}{n\pi}(1-(-1)^n)

So bn=4nπb_n = \frac{4}{n\pi} for nn odd and 00 for nn even. The Fourier series is

sgn(x)4πk=0sin((2k+1)x)2k+1=4π(sinx+sin3x3+sin5x5+)\text{sgn}(x) \sim \frac{4}{\pi}\sum_{k=0}^{\infty}\frac{\sin((2k+1)x)}{2k+1} = \frac{4}{\pi}\left(\sin x + \frac{\sin 3x}{3} + \frac{\sin 5x}{5} + \cdots\right)

📝 Example 2 (The sawtooth wave)

Let f(x)=xf(x) = x for x(π,π)x \in (-\pi, \pi). Again ff is odd, so an=0a_n = 0. Integration by parts gives

bn=2π0πxsin(nx)dx=2(1)n+1nb_n = \frac{2}{\pi}\int_0^{\pi} x\sin(nx)\,dx = \frac{2(-1)^{n+1}}{n}

The Fourier series is x2n=1(1)n+1nsin(nx)=2(sinxsin2x2+sin3x3)x \sim 2\sum_{n=1}^{\infty}\frac{(-1)^{n+1}}{n}\sin(nx) = 2\left(\sin x - \frac{\sin 2x}{2} + \frac{\sin 3x}{3} - \cdots\right).

📝 Example 3 (The triangle wave)

Let f(x)=xf(x) = |x| for x(π,π)x \in (-\pi, \pi). This is even, so bn=0b_n = 0. We get a0=πa_0 = \pi and for n1n \geq 1:

an=2π0πxcos(nx)dx=2n2π((1)n1)a_n = \frac{2}{\pi}\int_0^{\pi} x\cos(nx)\,dx = \frac{2}{n^2\pi}((-1)^n - 1)

So an=4n2πa_n = -\frac{4}{n^2\pi} for nn odd and 00 for nn even. The series is

xπ24πk=0cos((2k+1)x)(2k+1)2|x| \sim \frac{\pi}{2} - \frac{4}{\pi}\sum_{k=0}^{\infty}\frac{\cos((2k+1)x)}{(2k+1)^2}

💡 Remark 2 (Why trigonometric functions?)

The trigonometric basis {cosnx,sinnx}\{\cos nx, \sin nx\} is natural for periodic functions because: (i) they are the only bounded solutions of y+n2y=0y'' + n^2 y = 0 — eigenfunctions of d2/dx2-d^2/dx^2 with eigenvalue n2n^2; (ii) they are periodic with period 2π/n2\pi/n, so sums of such functions are 2π2\pi-periodic; and (iii) they are orthogonal under the natural inner product — as the next section demonstrates.

3. The Inner Product & Orthogonality

The reason Fourier coefficients take the simple form an=f,cosnxa_n = \langle f, \cos nx \rangle is that the trigonometric system is orthogonal. This section introduces the inner product that makes this work — a genuinely new piece of mathematical machinery that does not appear in power series theory.

📐 Definition 4 (Inner Product on L²[-π, π])

For functions f,gf, g integrable on [π,π][-\pi, \pi], define the inner product

f,g=1πππf(x)g(x)dx\langle f, g \rangle = \frac{1}{\pi}\int_{-\pi}^{\pi} f(x)\, g(x)\,dx

and the corresponding norm f=f,f=1πππf(x)2dx\|f\| = \sqrt{\langle f, f \rangle} = \sqrt{\frac{1}{\pi}\int_{-\pi}^{\pi} f(x)^2\,dx}.

The factor 1/π1/\pi is a normalization that makes cosnx=1\|\cos nx\| = 1 and sinnx=1\|\sin nx\| = 1 for n1n \geq 1, while 1/2=1\|1/\sqrt{2}\| = 1.

🔷 Theorem 1 (Orthogonality of the Trigonometric System)

The functions {1/2,cosx,cos2x,,sinx,sin2x,}\{1/\sqrt{2}, \cos x, \cos 2x, \ldots, \sin x, \sin 2x, \ldots\} form an orthonormal system under ,\langle \cdot, \cdot \rangle:

cosmx,cosnx=δmn,sinmx,sinnx=δmn,cosmx,sinnx=0\langle \cos mx, \cos nx \rangle = \delta_{mn}, \quad \langle \sin mx, \sin nx \rangle = \delta_{mn}, \quad \langle \cos mx, \sin nx \rangle = 0

for all m,n1m, n \geq 1, and 1,cosnx=0\langle 1, \cos nx \rangle = 0, 1,sinnx=0\langle 1, \sin nx \rangle = 0 for n1n \geq 1, where δmn\delta_{mn} is the Kronecker delta.

Proof.

By direct computation using product-to-sum formulas. For cosmx,cosnx\langle \cos mx, \cos nx \rangle with mnm \neq n:

1πππcos(mx)cos(nx)dx=12πππ[cos((mn)x)+cos((m+n)x)]dx=0\frac{1}{\pi}\int_{-\pi}^{\pi}\cos(mx)\cos(nx)\,dx = \frac{1}{2\pi}\int_{-\pi}^{\pi}[\cos((m-n)x) + \cos((m+n)x)]\,dx = 0

since both cos((mn)x)\cos((m-n)x) and cos((m+n)x)\cos((m+n)x) integrate to zero over a full period (as m±n0m \pm n \neq 0). For m=nm = n:

1πππcos2(nx)dx=12πππ[1+cos(2nx)]dx=1\frac{1}{\pi}\int_{-\pi}^{\pi}\cos^2(nx)\,dx = \frac{1}{2\pi}\int_{-\pi}^{\pi}[1 + \cos(2nx)]\,dx = 1

The sinmx,sinnx\langle \sin mx, \sin nx \rangle and cosmx,sinnx\langle \cos mx, \sin nx \rangle cases follow similarly using the product-to-sum identities sinαsinβ=12[cos(αβ)cos(α+β)]\sin\alpha\sin\beta = \frac{1}{2}[\cos(\alpha-\beta) - \cos(\alpha+\beta)] and cosαsinβ=12[sin(α+β)sin(αβ)]\cos\alpha\sin\beta = \frac{1}{2}[\sin(\alpha+\beta) - \sin(\alpha-\beta)].

📝 Example 5 (Orthogonality by direct computation)

Verify: cos2x,sin3x=1πππcos(2x)sin(3x)dx=12πππ[sin(5x)+sin(x)]dx=0\langle \cos 2x, \sin 3x \rangle = \frac{1}{\pi}\int_{-\pi}^{\pi}\cos(2x)\sin(3x)\,dx = \frac{1}{2\pi}\int_{-\pi}^{\pi}[\sin(5x) + \sin(x)]\,dx = 0. This is a specific instance of the general orthogonality: cosines and sines of different (or same) frequencies are always orthogonal.

💡 Remark 3 (Fourier coefficients as projections)

Orthogonality gives a coordinate formula. If f=a02+(ancosnx+bnsinnx)f = \frac{a_0}{2} + \sum(a_n \cos nx + b_n \sin nx), then taking the inner product with cosmx\cos mx and using orthogonality:

f,cosmx=am\langle f, \cos mx \rangle = a_m

Similarly f,sinmx=bm\langle f, \sin mx \rangle = b_m. The Fourier coefficients are the orthogonal projections of ff onto each basis function — exactly as vector components are dot products with unit vectors in Rn\mathbb{R}^n. The coefficient formulas in Definition 2 are not ad hoc; they are forced by orthogonality.

Try the explorer below — select two basis functions and watch how their pointwise product integrates to exactly 0 when the frequencies differ (the positive and negative areas cancel), and to exactly 1 when the frequencies match (the product is always non-negative).

〈φ2, φ3〉 = 0
cos(2x) cos(3x) productpositive areanegative area

Six-panel orthogonality visualization — three orthogonal pairs showing cancellation of positive and negative areas, three diagonal pairs showing all-positive area

4. Fourier Partial Sums & Pointwise Convergence

📐 Definition 5 (Fourier Partial Sum)

The nnth Fourier partial sum of ff is

Sn(x)=a02+k=1n(akcoskx+bksinkx)S_n(x) = \frac{a_0}{2} + \sum_{k=1}^{n}(a_k \cos kx + b_k \sin kx)

This can be written as a convolution: Sn(x)=1πππf(t)Dn(xt)dtS_n(x) = \frac{1}{\pi}\int_{-\pi}^{\pi} f(t)\, D_n(x - t)\,dt where

Dn(t)=12+k=1ncos(kt)=sin((n+12)t)2sin(t/2)D_n(t) = \frac{1}{2} + \sum_{k=1}^{n}\cos(kt) = \frac{\sin\left((n+\tfrac{1}{2})t\right)}{2\sin(t/2)}

is the Dirichlet kernel.

🔷 Theorem 3 (The Riemann-Lebesgue Lemma)

If ff is integrable on [π,π][-\pi, \pi], then an0a_n \to 0 and bn0b_n \to 0 as nn \to \infty.

Proof.

For a step function g=c1[a,b]g = c \cdot \mathbf{1}_{[a,b]}, direct computation gives abccos(nx)dx=c[sin(nb)sin(na)]/n0\int_a^b c\cos(nx)\,dx = c[\sin(nb) - \sin(na)]/n \to 0. Every integrable function can be approximated in the L1L^1 norm by step functions (from the definition of the Riemann integral). For general ff: given ε>0\varepsilon > 0, choose a step function gg with fg<ε\int|f - g| < \varepsilon. Then

an(f)an(fg)+an(g)1πfg+an(g)<επ+an(g)|a_n(f)| \leq |a_n(f - g)| + |a_n(g)| \leq \frac{1}{\pi}\int|f - g| + |a_n(g)| < \frac{\varepsilon}{\pi} + |a_n(g)|

Since an(g)0a_n(g) \to 0, we get lim supan(f)ε/π\limsup|a_n(f)| \leq \varepsilon/\pi. Since ε\varepsilon is arbitrary, an(f)0a_n(f) \to 0. The argument for bnb_n is identical.

🔷 Theorem 4 (Dirichlet's Convergence Theorem)

If ff is piecewise smooth on [π,π][-\pi, \pi] (meaning ff and ff' are piecewise continuous), then the Fourier series converges pointwise:

Sn(x)12[f(x+)+f(x)]for every xS_n(x) \to \frac{1}{2}[f(x^+) + f(x^-)] \quad \text{for every } x

At points of continuity, this gives Sn(x)f(x)S_n(x) \to f(x). At jump discontinuities, the partial sums converge to the average of the left and right limits.

📝 Example 4 (Square wave convergence)

At x=0x = 0 (discontinuity of sgn(x)\text{sgn}(x)), Dirichlet’s theorem gives Sn(0)12[f(0+)+f(0)]=12[1+(1)]=0S_n(0) \to \frac{1}{2}[f(0^+) + f(0^-)] = \frac{1}{2}[1 + (-1)] = 0. At x=π/2x = \pi/2 (point of continuity), Sn(π/2)f(π/2)=1S_n(\pi/2) \to f(\pi/2) = 1.

💡 Remark 4 (The Dirichlet kernel and summation)

The Dirichlet kernel Dn(t)=sin((n+12)t)/(2sin(t/2))D_n(t) = \sin((n+\frac{1}{2})t)/(2\sin(t/2)) is the analog of the “evaluation functional” for Fourier series. As nn \to \infty, DnD_n concentrates near t=0t = 0 while oscillating rapidly elsewhere — a nascent delta function. The increasingly wild oscillations of DnD_n are what cause convergence difficulties (including the Gibbs phenomenon in the next section). Compare with power series, where no such kernel appears because the monomial basis is not orthogonal under an integral inner product.

Drag the nn slider to watch the Fourier partial sums converge to each target function. Notice how convergence is fast for the smooth periodic function but painfully slow near the square wave’s discontinuities.

\u2501 f(x)\u254c\u254c S5(x)n = 5 · sgn(x) · decay O(1/n)max |error|smooth = 4.28e-1
Click the chart to probe a point

Fourier partial sums converging for three functions at n = 1, 5, 20

5. The Gibbs Phenomenon

This is the most visually striking result in Fourier theory. Near a jump discontinuity, Fourier partial sums exhibit an overshoot that does not vanish as nn \to \infty. The partial sums converge pointwise (Dirichlet’s theorem), but the maximum value near the jump overshoots by approximately 9%.

🔷 Proposition 2 (The Gibbs Phenomenon)

Let ff have a jump discontinuity at x0x_0 with jump height [f]=f(x0+)f(x0)[f] = f(x_0^+) - f(x_0^-). Then the maximum of Sn(x)S_n(x) near x0x_0 overshoots the target value f(x0+)f(x_0^+) by approximately

[f]2(2π0πsinttdt1)0.0895[f]\frac{[f]}{2}\left(\frac{2}{\pi}\int_0^{\pi}\frac{\sin t}{t}\,dt - 1\right) \approx 0.0895 \cdot [f]

The overshoot ratio 2π0πsinttdt1.1790\frac{2}{\pi}\int_0^{\pi}\frac{\sin t}{t}\,dt \approx 1.1790 is independent of nn — increasing nn compresses the overshoot into a narrower region near x0x_0 but does not reduce its height.

📝 Example 9 (Gibbs overshoot for the square wave)

The square wave f(x)=sgn(x)f(x) = \text{sgn}(x) has jump [f]=2[f] = 2 at x=0x = 0. The maximum of Sn(x)S_n(x) near x=0+x = 0^+ approaches approximately 1+2×0.0895=1.1791 + 2 \times 0.0895 = 1.179 as nn \to \infty, not 11. The location of the maximum moves toward x=0x = 0 as nn increases (at approximately x=π/nx = \pi/n), but its height remains 1.179\approx 1.179.

💡 Remark 5 (Gibbs vs. uniform convergence)

The Gibbs phenomenon means that SnfS_n \to f is not uniform near a jump discontinuity, even though it is pointwise. This is fundamentally different from power series, where convergence inside the radius of convergence is uniform on compact subsets (Power Series & Taylor Series, Theorem 4). For Fourier series, uniform convergence requires global smoothness (see Section 7).

Increase nn in the explorer below and watch: the overshoot region gets narrower but the peak height stays at 1.179\approx 1.179. The right panel confirms that the overshoot ratio converges to the Gibbs constant.

Target f(x) S20(x) Gibbs limit 1.179
n = 20, max overshoot = 1.1798, ratio = 1.1798, position π/20

Gibbs phenomenon — zoomed view of overshoot at the square wave discontinuity for n = 10, 50, 200, and the overshoot ratio converging to ≈ 1.179

6. Coefficient Decay & Smoothness

The central principle of Fourier analysis: smoothness of ff controls the decay rate of Fourier coefficients, and conversely.

🔷 Proposition 1 (Coefficient Decay and Smoothness)

(a) If ff is CkC^k (i.e., kk times continuously differentiable) and 2π2\pi-periodic, then an,bn=O(1/nk)a_n, b_n = O(1/n^k) as nn \to \infty. In particular:

  • Discontinuous \Rightarrow O(1/n)O(1/n)
  • C1O(1/n2)C^1 \Rightarrow O(1/n^2)
  • CC^\infty \Rightarrow faster than any polynomial (superpolynomial decay)

(b) The mechanism: integration by parts kk times transfers kk powers of nn from the coefficient to the function. Specifically, if ff is CkC^k and periodic, then the Fourier coefficients of f(k)f^{(k)} satisfy (f(k))(n)=(in)kf^(n)(f^{(k)})^\wedge(n) = (in)^k \hat{f}(n), so cn=cn(k)/nk|c_n| = |c_n^{(k)}|/n^k where cn(k)c_n^{(k)} are the (bounded) coefficients of f(k)f^{(k)}.

📝 Example 10 (Three decay regimes)

(a) The square wave has bn=O(1/n)b_n = O(1/n) — discontinuous function, slowest possible decay.

(b) The triangle wave has an=O(1/n2)a_n = O(1/n^2) — continuous but not differentiable (corner at x=0x = 0), one degree faster.

(c) The smooth periodic function f(x)=1/(2cosx)f(x) = 1/(2 - \cos x) has coefficients that decay exponentially: an=O(rn)a_n = O(r^n) with r=230.268r = 2 - \sqrt{3} \approx 0.268. This is the Fourier counterpart of the smooth-vs-analytic distinction from Power Series & Taylor Series.

Compare the three decay regimes in the explorer. On a log scale, O(1/n)O(1/n) decay is a straight line with slope 1-1, O(1/n2)O(1/n^2) has slope 2-2, and exponential decay drops off a cliff.

Smoothness: DiscontinuousPredicted decay: O(1/n)Observed: matches

Coefficient decay on log scale for three smoothness classes — discontinuous (1/n), C⁰ (1/n²), and analytic (exponential)

7. Uniform Convergence & C1C^1 Functions

🔷 Theorem 5 (Uniform Convergence under Smoothness)

If ff is C1C^1 (continuously differentiable) and 2π2\pi-periodic, then the Fourier series of ff converges uniformly to ff.

More generally, if ff is continuous, 2π2\pi-periodic, and the Fourier coefficients satisfy n=1(an+bn)<\sum_{n=1}^{\infty}(|a_n| + |b_n|) < \infty, then the Fourier series converges uniformly by the Weierstrass M-test from Uniform Convergence.

📝 Example 11 (Triangle wave — uniform but slow)

The triangle wave f(x)=xf(x) = |x| on (π,π)(-\pi, \pi) has an=O(1/n2)a_n = O(1/n^2). Since 1/n2<\sum 1/n^2 < \infty (pp-series with p=2p = 2 from Series Convergence & Tests), the Fourier series converges uniformly to ff by the Weierstrass M-test. There is no Gibbs phenomenon because ff is continuous (though not differentiable). But the convergence rate is O(1/n)O(1/n) in the sup-norm — much slower than exponential convergence for smooth functions.

8. Bessel’s Inequality & Parseval’s Identity

Energy conservation in the frequency domain. Bessel’s inequality says the energy in the Fourier coefficients cannot exceed the energy in ff. Parseval’s identity says equality holds — no energy is lost.

🔷 Theorem 2 (Bessel's Inequality)

For any integrable ff on [π,π][-\pi, \pi],

a022+n=1N(an2+bn2)1πππf(x)2dx=f2\frac{a_0^2}{2} + \sum_{n=1}^{N}(a_n^2 + b_n^2) \leq \frac{1}{\pi}\int_{-\pi}^{\pi} f(x)^2\,dx = \|f\|^2

for every NN. Taking NN \to \infty: a022+n=1(an2+bn2)f2\frac{a_0^2}{2} + \sum_{n=1}^{\infty}(a_n^2 + b_n^2) \leq \|f\|^2.

Proof.

Expand fSN20\|f - S_N\|^2 \geq 0:

01πππ(f(x)SN(x))2dx=f22f,SN+SN20 \leq \frac{1}{\pi}\int_{-\pi}^{\pi}\bigl(f(x) - S_N(x)\bigr)^2\,dx = \|f\|^2 - 2\langle f, S_N \rangle + \|S_N\|^2

By orthogonality, f,SN=a022+n=1N(an2+bn2)\langle f, S_N \rangle = \frac{a_0^2}{2} + \sum_{n=1}^{N}(a_n^2 + b_n^2) and SN2=a022+n=1N(an2+bn2)\|S_N\|^2 = \frac{a_0^2}{2} + \sum_{n=1}^{N}(a_n^2 + b_n^2). Substituting:

0f2a022n=1N(an2+bn2)0 \leq \|f\|^2 - \frac{a_0^2}{2} - \sum_{n=1}^{N}(a_n^2 + b_n^2) \qquad \square

🔷 Theorem 6 (Parseval's Identity)

If ff is square-integrable on [π,π][-\pi, \pi], then

a022+n=1(an2+bn2)=f2=1πππf(x)2dx\frac{a_0^2}{2} + \sum_{n=1}^{\infty}(a_n^2 + b_n^2) = \|f\|^2 = \frac{1}{\pi}\int_{-\pi}^{\pi} f(x)^2\,dx

Equality holds in Bessel’s inequality: no energy is lost in the Fourier decomposition.

📝 Example 6 (Bessel verification for the square wave)

The square wave sgn(x)\text{sgn}(x) has f2=1πππ1dx=2\|f\|^2 = \frac{1}{\pi}\int_{-\pi}^{\pi} 1\,dx = 2. With bn=4/(nπ)b_n = 4/(n\pi) for odd nn, Parseval gives

k=016(2k+1)2π2=2\sum_{k=0}^{\infty} \frac{16}{(2k+1)^2\pi^2} = 2

which rearranges to k=01(2k+1)2=π28\sum_{k=0}^{\infty}\frac{1}{(2k+1)^2} = \frac{\pi^2}{8} — a non-trivial identity derived purely from Parseval.

📝 Example 7 (Parseval for the sawtooth — the Basel problem)

The sawtooth f(x)=xf(x) = x has f2=1πππx2dx=2π23\|f\|^2 = \frac{1}{\pi}\int_{-\pi}^{\pi} x^2\,dx = \frac{2\pi^2}{3}. With bn=2(1)n+1/nb_n = 2(-1)^{n+1}/n, Parseval gives

n=14n2=2π23\sum_{n=1}^{\infty}\frac{4}{n^2} = \frac{2\pi^2}{3}

which yields n=11n2=π26\sum_{n=1}^{\infty}\frac{1}{n^2} = \frac{\pi^2}{6} — the Basel problem, one of the most celebrated results in analysis, solved here by Fourier theory.

Cumulative coefficient energy approaching the norm squared, and Parseval yielding π²/6 and π²/8

9. Term-by-Term Integration & Differentiation Caveats

🔷 Theorem 7 (Term-by-Term Integration of Fourier Series)

If ff is integrable on [π,π][-\pi, \pi] with Fourier series f(x)a02+(ancosnx+bnsinnx)f(x) \sim \frac{a_0}{2} + \sum(a_n \cos nx + b_n \sin nx), then for any x[π,π]x \in [-\pi, \pi]:

πxf(t)dt=a02(x+π)+n=1(ansinnxn+bn(1cosnx)n)\int_{-\pi}^{x} f(t)\,dt = \frac{a_0}{2}(x + \pi) + \sum_{n=1}^{\infty}\left(\frac{a_n \sin nx}{n} + \frac{b_n(1 - \cos nx)}{n}\right)

The integrated series converges uniformly, even if the original Fourier series does not.

Proof.

The integrated series has coefficients O(an/n)O(a_n/n) and O(bn/n)O(b_n/n). By Bessel’s inequality from Theorem 2, (an2+bn2)<\sum(a_n^2 + b_n^2) < \infty. By Cauchy-Schwarz,

n=1(ann+bnn)(an2+bn2)1n2<\sum_{n=1}^{\infty}\left(\frac{|a_n|}{n} + \frac{|b_n|}{n}\right) \leq \sqrt{\sum(a_n^2 + b_n^2)} \cdot \sqrt{\sum \frac{1}{n^2}} < \infty

The Weierstrass M-test from Uniform Convergence then gives uniform convergence.

📝 Example 8 (Integrating the square wave)

Integrating sgn(x)4πk=0sin((2k+1)x)2k+1\text{sgn}(x) \sim \frac{4}{\pi}\sum_{k=0}^{\infty}\frac{\sin((2k+1)x)}{2k+1} from π-\pi to xx gives

xπ4πk=01cos((2k+1)x)(2k+1)2|x| - \pi \sim -\frac{4}{\pi}\sum_{k=0}^{\infty}\frac{1 - \cos((2k+1)x)}{(2k+1)^2}

which converges uniformly — the integrated series of a non-uniformly convergent Fourier series is uniformly convergent. Integration improves convergence by one order.

💡 Remark 6 (Differentiation caveats)

Term-by-term differentiation of Fourier series is not generally valid. Differentiating the square wave’s Fourier series gives 4πk=0cos((2k+1)x)\frac{4}{\pi}\sum_{k=0}^{\infty}\cos((2k+1)x), which diverges everywhere. Differentiation multiplies the nnth coefficient by nn, making decay worse. For term-by-term differentiation to be valid, ff must be C1C^1 and periodic (so that ff' also has a convergent Fourier series). Compare with power series, where differentiation preserves the radius of convergence — Fourier series are more delicate.

Term-by-term integration: the integrated series converges uniformly even though the original does not

10. The Complex Exponential Form

💡 Remark 7 (Complex exponential form)

Using cosnx=(einx+einx)/2\cos nx = (e^{inx} + e^{-inx})/2 and sinnx=(einxeinx)/(2i)\sin nx = (e^{inx} - e^{-inx})/(2i), the Fourier series can be written

f(x)n=cneinxwherecn=12πππf(x)einxdxf(x) \sim \sum_{n=-\infty}^{\infty} c_n e^{inx} \qquad \text{where} \quad c_n = \frac{1}{2\pi}\int_{-\pi}^{\pi} f(x)e^{-inx}\,dx

The complex form is more compact and connects directly to the Fourier transform (the non-periodic analog, extending this to all of R\mathbb{R}). In the complex form, Parseval’s identity becomes n=cn2=12πππf(x)2dx\sum_{n=-\infty}^{\infty}|c_n|^2 = \frac{1}{2\pi}\int_{-\pi}^{\pi}|f(x)|^2\,dx.

11. Connections to ML — Positional Encodings, Spectral Methods, Signal Processing

11.1. Sinusoidal positional encodings in transformers

The Transformer architecture encodes sequence position using

PE(pos,2i)=sin(pos/100002i/dmodel),PE(pos,2i+1)=cos(pos/100002i/dmodel)\text{PE}(\text{pos}, 2i) = \sin(\text{pos}/10000^{2i/d_{\text{model}}}), \qquad \text{PE}(\text{pos}, 2i+1) = \cos(\text{pos}/10000^{2i/d_{\text{model}}})

This is a Fourier basis at geometrically spaced frequencies. The orthogonality of the trigonometric system means the encoding at each frequency carries independent positional information. The choice of sinusoidal functions (rather than polynomial functions of position) enables the model to attend to relative positions via linear transformations — because sin(a+b)\sin(a + b) and cos(a+b)\cos(a + b) are linear combinations of sina,cosa,sinb,cosb\sin a, \cos a, \sin b, \cos b.

→ formalML: PAC Learning — Fourier analysis on Boolean functions uses the same orthogonal decomposition structure.

11.2. Random Fourier Features

For shift-invariant kernels k(x,y)=k(xy)k(x, y) = k(x - y), Bochner’s theorem says k(z)=eiωTzdp(ω)k(z) = \int e^{i\omega^T z}\,dp(\omega) for some probability measure pp. Random Fourier Features approximate the kernel using random trigonometric functions:

k(x,y)z(x)Tz(y)wherez(x)=2/D[cos(ω1Tx+b1),,cos(ωDTx+bD)]Tk(x, y) \approx z(x)^T z(y) \quad \text{where} \quad z(x) = \sqrt{2/D}\,[\cos(\omega_1^T x + b_1), \ldots, \cos(\omega_D^T x + b_D)]^T

with ωjp\omega_j \sim p and bjUniform[0,2π]b_j \sim \text{Uniform}[0, 2\pi]. Each random feature samples one Fourier coefficient of the kernel.

→ formalML: Gradient Descent — Random Fourier Features enable scalable kernel methods in optimization.

11.3. Spectral normalization

In GANs and deep networks, spectral normalization constrains the Lipschitz constant of each layer by dividing the weight matrix by its largest singular value. The singular values are the “spectral” (frequency-domain) quantities of the linear map — a generalization of Fourier analysis from functions to linear operators. Understanding Parseval’s identity (energy conservation under the Fourier transform) motivates why spectral normalization preserves representational power while constraining capacity.

11.4. Frequency-domain signal processing

Convolutional neural networks process signals in the spatial domain, but the convolution theorem says fg^=f^g^\widehat{f * g} = \hat{f} \cdot \hat{g} — convolution in the spatial domain is pointwise multiplication in the Fourier domain. Fast implementations of large-kernel convolutions use FFT to compute convolutions in O(nlogn)O(n \log n) rather than O(n2)O(n^2). The Fourier basis makes this possible because multiplication of Fourier coefficients is equivalent to function convolution.

ML connections — positional encodings, random Fourier features, spectral normalization, FFT convolution

12. Computational Notes

Fourier coefficient computation. For explicit functions, compute ana_n and bnb_n analytically when possible; otherwise, use numerical quadrature. The trapezoidal rule is especially accurate for periodic functions — it converges exponentially for smooth periodic integrands, a fact that is itself a consequence of Fourier theory (the quadrature error depends on the Fourier coefficient decay of the integrand).

The FFT. The Fast Fourier Transform computes all nn discrete Fourier coefficients in O(nlogn)O(n \log n) operations, compared to O(n2)O(n^2) for direct computation. This is used in practice for signal processing, image analysis, and spectral methods in PDEs.

Gibbs mitigation. Cesàro (Fejér) summation replaces SnS_n with σn=(S0+S1++Sn)/(n+1)\sigma_n = (S_0 + S_1 + \cdots + S_n)/(n+1), which converges uniformly even at discontinuities — the Gibbs overshoot is smoothed out by averaging. The Lanczos sigma factor provides another smoothing approach. These techniques are used in spectral methods for PDEs where Gibbs oscillations corrupt numerical solutions.

Computational verification — FFT coefficients vs analytic, and convergence rates for different smoothness classes

13. Connections & Further Reading

Prerequisites (live links):

  • Power Series & Taylor Series — monomial basis {xn}\{x^n\} for local approximation; Fourier series use trigonometric basis {1,cosnx,sinnx}\{1, \cos nx, \sin nx\} for global periodic approximation.
  • The Riemann Integral & FTC — Fourier coefficients are Riemann integrals; the inner product f,g=(1/π)fg\langle f, g \rangle = (1/\pi)\int f\,g is defined via the integral.
  • Uniform Convergence — the Weierstrass M-test establishes uniform convergence of Fourier series; interchange theorems justify term-by-term integration.
  • Series Convergence & Tests — Bessel’s inequality is a series convergence statement; comparison tests bound coefficient decay rates.
  • Improper Integrals & Special Functions — the Riemann-Lebesgue lemma connects to improper integrals on R\mathbb{R}.

Downstream within formalCalculus:

  • Approximation Theory — Weierstrass approximation, Stone-Weierstrass, best approximation in L2L^2 and C[a,b]C[a,b].
  • Stability & Dynamical Systems — Fourier modes as eigenfunctions of the Laplacian, spectral gap, and convergence rates.
  • LpL^p Spaces (coming soon)L2L^2 completeness and Fourier series as orthogonal expansions in Hilbert space.
  • Inner Product & Hilbert Spaces — the inner product defined here on L2[π,π]L^2[-\pi, \pi] is the prototype for abstract Hilbert space theory.

Forward links to formalml.com:

References

  1. book Rudin (1976). Principles of Mathematical Analysis Chapter 8 — Fourier series, orthogonality, Parseval's theorem, and the Dirichlet kernel
  2. book Stein & Shakarchi (2003). Fourier Analysis: An Introduction Chapters 2-3 — the definitive modern treatment of Fourier series convergence, Gibbs phenomenon, and applications
  3. book Abbott (2015). Understanding Analysis Chapter 8 — Fourier series with emphasis on pointwise vs. uniform convergence and the role of the inner product
  4. book Folland (1999). Real Analysis: Modern Techniques and Their Applications Section 8.2 — Fourier series in the L² framework, Parseval's identity, and completeness of the trigonometric system
  5. book Spivak (2008). Calculus Chapter 15 — trigonometric functions and their orthogonality, with careful treatment of Fourier coefficient computation
  6. paper Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser & Polosukhin (2017). “Attention Is All You Need” Sinusoidal positional encodings PE(pos, 2i) = sin(pos/10000^(2i/d)) use the Fourier basis to encode sequence position — a direct application of the orthogonal trigonometric system to representation learning.
  7. paper Rahimi & Recht (2007). “Random Features for Large-Scale Kernel Machines” Random Fourier Features approximate shift-invariant kernels via Monte Carlo sampling of Fourier coefficients: z(x) = √(2/D)cos(ωᵀx + b) with ω drawn from the spectral distribution.