Functional Analysis · advanced · 50 min read

Calculus of Variations

Optimizing functionals — the Euler-Lagrange equation, Sobolev spaces, and the direct method

Abstract. The calculus of variations extends optimization from finite-dimensional spaces to spaces of functions. Given a functional — a map from a function space to the reals — we ask: which function minimizes it? The Euler-Lagrange equation provides the necessary condition, the second variation tests sufficiency, and the direct method of the calculus of variations proves existence. Sobolev spaces supply the natural domain, and the Lax-Milgram theorem connects variational problems to weak solutions of PDEs. These ideas underpin physics-informed neural networks, optimal transport, variational autoencoders, and diffusion models.

Where this leads → formalML

  • formalML The Euler-Lagrange equation is the infinite-dimensional prototype for first-order optimality conditions. Lagrangian duality generalizes constrained variational problems to convex optimization.
  • formalML Geodesics on statistical manifolds are calculus-of-variations problems with the Fisher information metric as the Lagrangian.
  • formalML The direct method provides the functional-analytic foundation for the existence of minimizers. Gradient descent in function spaces (gradient flow) is a continuous-time variational principle.
  • formalML Variational autoencoders minimize the ELBO — a variational objective over encoder-decoder function pairs. Diffusion models minimize score-matching losses with variational structure.

1. Overview and Motivation

In a physics-informed neural network, the loss function is a functional — it maps a function (the neural network) to a real number (the PDE residual). Minimizing it is not ordinary optimization over parameters; it is optimization over a function space. The loss is literally

J[u]=ΩLuf2dx,J[u] = \int_\Omega |\mathcal{L}u - f|^2 \, dx,

where L\mathcal{L} is a differential operator and uu is the network output viewed as a function. This is the calculus of variations in its modern incarnation.

The arc of Track 8 has been a staircase of abstraction, each step adding one axiom and gaining strictly stronger conclusions:

  1. Metric spaces (Topic 29): distances → completeness → fixed-point theorems.
  2. Normed/Banach spaces (Topic 30): length → bounded operators → the big four theorems.
  3. Inner product/Hilbert spaces (Topic 31): angles → orthogonality → projection, Riesz representation, spectral decomposition.
  4. Calculus of variations (this topic): perturbation → optimization over function spaces → Euler-Lagrange equations, direct method, eigenvalue problems.

The key idea of finite-dimensional optimization is: which point minimizes ff? Set the derivative to zero and solve. The calculus of variations asks a harder question: which function minimizes JJ? The answer requires a new derivative (the first variation, a directional derivative in function space), a new existence theory (the direct method, using weak compactness in Hilbert spaces), and a new domain (the Sobolev spaces that are the natural home for variational problems).

This is the final topic in formalCalculus — the capstone of both Track 8 and the entire 32-topic journey from epsilon-delta definitions to functional analysis. Let’s begin.


2. Functionals and Their Domains

📐 Definition 1 (Functional)

A functional is a map J:ARJ: \mathcal{A} \to \mathbb{R}, where A\mathcal{A} is a subset of a function space. We write J[y]J[y] rather than J(y)J(y) to distinguish functionals (which eat functions) from ordinary functions (which eat numbers).

📝 Example 1 (Arc Length Functional)

The arc length of a curve y:[a,b]Ry: [a,b] \to \mathbb{R} is

L[y]=ab1+y(x)2dx.L[y] = \int_a^b \sqrt{1 + y'(x)^2} \, dx.

This maps each C1C^1 function yy to a non-negative real number.

📝 Example 2 (Energy Functional)

The Dirichlet energy

E[y]=ab12y(x)2dxE[y] = \int_a^b \frac{1}{2} y'(x)^2 \, dx

measures the total “kinetic energy” stored in the gradient of yy. Minimizing E[y]E[y] subject to boundary conditions y(a)=αy(a) = \alpha, y(b)=βy(b) = \beta yields the straight line y(x)=α+(βα)xabay(x) = \alpha + (\beta - \alpha)\frac{x - a}{b - a}.

📝 Example 3 (Action Functional (Lagrangian Mechanics))

In classical mechanics, the action is

S[q]=t0t1L(q,q˙,t)dt,S[q] = \int_{t_0}^{t_1} L(q, \dot{q}, t) \, dt,

where L=TVL = T - V is the Lagrangian (kinetic minus potential energy). Hamilton’s principle: the physical trajectory extremizes the action.

📝 Example 4 (ML Loss as a Functional)

A PINN loss is a functional on the space of neural network functions:

J[uθ]=ΩLuθ(x)f(x)2dx+λΩuθ(x)g(x)2dS.J[u_\theta] = \int_\Omega |\mathcal{L}u_\theta(x) - f(x)|^2 \, dx + \lambda \int_{\partial\Omega} |u_\theta(x) - g(x)|^2 \, dS.

The parameters θ\theta parameterize a submanifold of function space; optimization over θ\theta is a finite-dimensional proxy for the infinite-dimensional variational problem.

💡 Remark 1 (Hilbert-Space Vocabulary Refresh)

We will use the inner-product and Hilbert-space machinery from Topic 31 throughout this topic. Recall: a Hilbert space is a complete inner-product space. The projection theorem guarantees a unique closest point in any closed convex subset. The Riesz representation theorem identifies the dual space HH^* with HH itself. These are not just background — they are the active ingredients in the direct method (Section 8) and the Lax-Milgram theorem (Section 9).

Three canonical functionals: arc length, energy, and action, each mapping a function to a real number.

3. The First Variation

The first variation is the directional derivative of a functional — the rate of change of J[y]J[y] when we perturb yy in the direction η\eta.

📐 Definition 2 (First Variation (Gâteaux Derivative))

The first variation of JJ at yy in the direction η\eta is

δJ[y;η]=limε0J[y+εη]J[y]ε=ddεε=0J[y+εη],\delta J[y; \eta] = \lim_{\varepsilon \to 0} \frac{J[y + \varepsilon\eta] - J[y]}{\varepsilon} = \frac{d}{d\varepsilon}\bigg|_{\varepsilon=0} J[y + \varepsilon\eta],

provided the limit exists for all admissible perturbations η\eta (typically ηC0(a,b)\eta \in C_0^\infty(a,b), meaning smooth functions vanishing at the boundary). If δJ[y;η]=0\delta J[y; \eta] = 0 for all η\eta, we call yy a critical point (or extremal) of JJ.

This is the function-space analogue of the gradient. In Rn\mathbb{R}^n, a critical point of ff satisfies f=0\nabla f = 0. In function space, a critical point of JJ satisfies δJ=0\delta J = 0 for all directions η\eta. The Euler-Lagrange equation (next section) is what δJ=0\delta J = 0 looks like when JJ has the integral form J[y]=L(x,y,y)dxJ[y] = \int L(x, y, y') \, dx.

🔷 Theorem 1 (Fundamental Lemma of the Calculus of Variations)

If fC([a,b])f \in C([a,b]) and

abf(x)η(x)dx=0for all ηC0(a,b),\int_a^b f(x) \eta(x) \, dx = 0 \quad \text{for all } \eta \in C_0^\infty(a,b),

then f(x)=0f(x) = 0 for all x[a,b]x \in [a,b].

Proof. Suppose for contradiction that f(x0)0f(x_0) \neq 0 at some x0(a,b)x_0 \in (a,b). Without loss of generality, assume f(x0)>0f(x_0) > 0. By continuity of ff, there exists δ>0\delta > 0 such that f(x)>f(x0)/2f(x) > f(x_0)/2 for all x(x0δ,x0+δ)(a,b)x \in (x_0 - \delta, x_0 + \delta) \subset (a,b).

Choose η\eta to be a smooth bump function supported on (x0δ,x0+δ)(x_0 - \delta, x_0 + \delta) with η0\eta \geq 0 and η(x0)>0\eta(x_0) > 0. For instance, take

η(x)={exp ⁣(11((xx0)/δ)2)if xx0<δ,0otherwise.\eta(x) = \begin{cases} \exp\!\left(-\frac{1}{1 - ((x - x_0)/\delta)^2}\right) & \text{if } |x - x_0| < \delta, \\ 0 & \text{otherwise.} \end{cases}

Then ηC0(a,b)\eta \in C_0^\infty(a,b), and

abf(x)η(x)dx=x0δx0+δf(x)η(x)dx>f(x0)2x0δx0+δη(x)dx>0.\int_a^b f(x)\eta(x)\,dx = \int_{x_0-\delta}^{x_0+\delta} f(x)\eta(x)\,dx > \frac{f(x_0)}{2}\int_{x_0-\delta}^{x_0+\delta} \eta(x)\,dx > 0.

This contradicts fη=0\int f \eta = 0 for all η\eta. Therefore f0f \equiv 0. \square

📝 Example 5 (First Variation of the Energy Functional)

For E[y]=0112y2dxE[y] = \int_0^1 \frac{1}{2}y'^2 \, dx:

ddεε=0E[y+εη]=ddεε=00112(y+εη)2dx=01yηdx.\frac{d}{d\varepsilon}\bigg|_{\varepsilon=0} E[y + \varepsilon\eta] = \frac{d}{d\varepsilon}\bigg|_{\varepsilon=0} \int_0^1 \frac{1}{2}(y' + \varepsilon\eta')^2 \, dx = \int_0^1 y'\eta' \, dx.

Integrating by parts (with η(0)=η(1)=0\eta(0) = \eta(1) = 0): δE[y;η]=01yηdx\delta E[y;\eta] = -\int_0^1 y''\eta \, dx. Setting this to zero for all η\eta and applying the fundamental lemma gives y=0y'' = 0 — the extremal is a straight line.

📝 Example 6 (First Variation of the Arc Length Functional)

For L[y]=011+y2dxL[y] = \int_0^1 \sqrt{1+y'^2} \, dx, a similar computation gives

δL[y;η]=01y1+y2ηdx.\delta L[y;\eta] = \int_0^1 \frac{y'}{\sqrt{1+y'^2}}\eta'\,dx.

After integration by parts, the Euler-Lagrange equation is ddxy1+y2=0\frac{d}{dx}\frac{y'}{\sqrt{1+y'^2}} = 0, which implies yy' is constant — again, a straight line (which minimizes arc length between two points in the plane).

💡 Remark 2 (Connection to Taylor Expansion)

The first variation is a Taylor expansion in function space. Writing J[y+εη]J[y + \varepsilon\eta] as a power series in ε\varepsilon (cf. Topic 6):

J[y+εη]=J[y]+εδJ[y;η]+ε22δ2J[y;η]+O(ε3).J[y + \varepsilon\eta] = J[y] + \varepsilon \, \delta J[y;\eta] + \frac{\varepsilon^2}{2} \delta^2 J[y;\eta] + O(\varepsilon^3).

The first variation is the linear term; the second variation (Section 6) is the quadratic term. A critical point has δJ=0\delta J = 0; a minimum requires δ2J0\delta^2 J \geq 0.

The first variation: curve y, perturbation y+εη, and J(ε) parabola with minimum at ε=0.

4. The Euler-Lagrange Equation

This is the central result of the classical calculus of variations: a necessary condition for a curve to be an extremal of a functional.

🔷 Theorem 2 (The Euler-Lagrange Equation)

Let L=L(x,y,y)L = L(x, y, y') be C2C^2 in all arguments, and let J[y]=abL(x,y,y)dxJ[y] = \int_a^b L(x, y, y') \, dx. If yC2([a,b])y \in C^2([a,b]) is an extremal of JJ (i.e., δJ[y;η]=0\delta J[y;\eta] = 0 for all ηC0(a,b)\eta \in C_0^\infty(a,b)), then yy satisfies the Euler-Lagrange equation:

LyddxLy=0.\frac{\partial L}{\partial y} - \frac{d}{dx}\frac{\partial L}{\partial y'} = 0.

Proof. We proceed in six steps.

Step 1: Expand the perturbed functional. Replace yy by y+εηy + \varepsilon\eta in JJ:

J[y+εη]=abL(x,y+εη,y+εη)dx.J[y + \varepsilon\eta] = \int_a^b L(x,\, y+\varepsilon\eta,\, y'+\varepsilon\eta') \, dx.

Step 2: Differentiate under the integral. By the chain rule:

ddεε=0J[y+εη]=ab[Lyη+Lyη]dx.\frac{d}{d\varepsilon}\bigg|_{\varepsilon=0} J[y+\varepsilon\eta] = \int_a^b \left[\frac{\partial L}{\partial y}\eta + \frac{\partial L}{\partial y'}\eta'\right] dx.

Step 3: Integrate the second term by parts. Writing u=Lyu = \frac{\partial L}{\partial y'} and dv=ηdxdv = \eta' \, dx, we get v=ηv = \eta and du=ddxLydxdu = \frac{d}{dx}\frac{\partial L}{\partial y'} dx:

abLyηdx=[Lyη]ababddxLyηdx.\int_a^b \frac{\partial L}{\partial y'}\eta' \, dx = \left[\frac{\partial L}{\partial y'}\eta\right]_a^b - \int_a^b \frac{d}{dx}\frac{\partial L}{\partial y'}\eta \, dx.

Step 4: Apply the boundary conditions. Since η(a)=η(b)=0\eta(a) = \eta(b) = 0, the boundary term vanishes:

[Lyη]ab=0.\left[\frac{\partial L}{\partial y'}\eta\right]_a^b = 0.

Step 5: Combine and factor. Substituting back:

δJ[y;η]=ab[LyddxLy]ηdx=0\delta J[y;\eta] = \int_a^b \left[\frac{\partial L}{\partial y} - \frac{d}{dx}\frac{\partial L}{\partial y'}\right]\eta \, dx = 0

for all ηC0(a,b)\eta \in C_0^\infty(a,b).

Step 6: Apply the fundamental lemma. By Theorem 1, since the expression in brackets is continuous and its integral against every smooth compactly-supported η\eta vanishes, we conclude

LyddxLy=0\frac{\partial L}{\partial y} - \frac{d}{dx}\frac{\partial L}{\partial y'} = 0

on (a,b)(a,b). \square

J[y] = 0.5237
J[y*] = 0.5000
J[y+εη] = 0.5237
Gap = 0.0237

The explorer above lets you drag control points on a candidate curve y(x)y(x) and watch the functional value J[y]J[y] update in real time. The green curve is the analytic Euler-Lagrange solution — it achieves the minimum. The JJ vs ε\varepsilon plot on the right confirms that the functional value has a minimum at ε=0\varepsilon = 0 when the candidate is the E-L solution.

📝 Example 7 (Shortest Path (Straight Line))

For L=1+y2L = \sqrt{1+y'^2}: Ly=0\frac{\partial L}{\partial y} = 0 and Ly=y1+y2\frac{\partial L}{\partial y'} = \frac{y'}{\sqrt{1+y'^2}}. The Euler-Lagrange equation gives ddxy1+y2=0\frac{d}{dx}\frac{y'}{\sqrt{1+y'^2}} = 0, so y=consty' = \text{const}. The extremal is a straight line — as expected.

📝 Example 8 (Harmonic Oscillator from Variational Principle)

The action S[q]=0T(12mq˙212kq2)dtS[q] = \int_0^T (\frac{1}{2}m\dot{q}^2 - \frac{1}{2}kq^2)\,dt has Lagrangian L=12mq˙212kq2L = \frac{1}{2}m\dot{q}^2 - \frac{1}{2}kq^2. The Euler-Lagrange equation yields mq¨+kq=0m\ddot{q} + kq = 0 — Newton’s second law for a spring.

💡 Remark 3 (Natural vs. Essential Boundary Conditions)

The proof of the Euler-Lagrange equation assumed η(a)=η(b)=0\eta(a) = \eta(b) = 0 (Dirichlet/essential boundary conditions). If we allow η(b)0\eta(b) \neq 0, the boundary term Lyx=bη(b)\frac{\partial L}{\partial y'}\big|_{x=b} \cdot \eta(b) must vanish independently, giving the natural boundary condition Lyx=b=0\frac{\partial L}{\partial y'}\big|_{x=b} = 0. This is important in finite element methods.

The Euler-Lagrange derivation: integration by parts, boundary terms, and the fundamental lemma.

5. Classical Examples

These are the problems that launched the calculus of variations in the 17th and 18th centuries.

📝 Example 9 (The Brachistochrone Problem)

Problem: Find the curve of fastest descent under gravity from point AA to point BB, starting from rest.

The descent time is T[y]=0x11+y22gydxT[y] = \int_0^{x_1} \sqrt{\frac{1+y'^2}{2gy}}\,dx. The Euler-Lagrange equation (with Lagrangian L=(1+y2)/(2gy)L = \sqrt{(1+y'^2)/(2gy)}) yields 2yy+y2+1=02yy'' + y'^2 + 1 = 0. The solution is a cycloid:

x=R(θsinθ),y=R(1cosθ),x = R(\theta - \sin\theta), \quad y = R(1 - \cos\theta),

where RR is determined by the endpoint condition. The cycloid is about 19% faster than the straight-line path.

Cycloid: 0.5673s
Line: 0.6923s
Improvement: 18.1%
Endpoint: (1.00, 0.50)

📝 Example 10 (Geodesics on Surfaces)

A geodesic minimizes the arc-length functional L[γ]=abgijγ˙iγ˙jdtL[\gamma] = \int_a^b \sqrt{g_{ij}\dot{\gamma}^i\dot{\gamma}^j}\,dt (cf. Topic 20). On a sphere of radius RR, the Euler-Lagrange equation yields great circles — the shortest paths between points on the sphere.

📝 Example 11 (The Catenary)

A flexible chain of uniform density hanging under gravity adopts the shape that minimizes potential energy V[y]=aaρgy1+y2dxV[y] = \int_{-a}^a \rho g\, y\sqrt{1+y'^2}\,dx subject to the constraint of fixed length. The Euler-Lagrange equation with a Lagrange multiplier gives the catenary:

y(x)=ccosh(xc),y(x) = c\cosh\left(\frac{x}{c}\right),

where c>0c > 0 depends on the chain length and endpoint separation.

📝 Example 12 (The Isoperimetric Problem)

Problem: Among all closed curves of fixed perimeter PP, which one encloses the maximum area?

This is a constrained variational problem. We maximize A[γ]=12(xdyydx)A[\gamma] = \frac{1}{2}\oint (x\,dy - y\,dx) subject to L[γ]=PL[\gamma] = P. Using a Lagrange multiplier λ\lambda, the Euler-Lagrange equation for J=AλLJ = A - \lambda L yields a circle of radius R=P/(2π)R = P/(2\pi).

💡 Remark 4 (Constrained Variations and Lagrange Multipliers)

Constrained variational problems (like the isoperimetric problem) add a side condition K[y]=cK[y] = c to the functional J[y]J[y]. The method of Lagrange multipliers in function space: extremize J[y]λK[y]J[y] - \lambda K[y] over all yy and λ\lambda. This is the infinite-dimensional prototype for Lagrangian duality in optimization.

Brachistochrone: cycloid vs. straight line vs. parabola with descent times. Geodesics on a sphere, catenary curve, and the isoperimetric circle.

6. The Second Variation and Sufficient Conditions

The Euler-Lagrange equation is a necessary condition — it finds critical points. But is a critical point a minimum, a maximum, or a saddle point? The second variation answers this question, just as the second derivative does in single-variable calculus.

📐 Definition 3 (Second Variation)

The second variation of JJ at an extremal yy in the direction η\eta is

δ2J[y;η]=d2dε2ε=0J[y+εη].\delta^2 J[y;\eta] = \frac{d^2}{d\varepsilon^2}\bigg|_{\varepsilon=0} J[y + \varepsilon\eta].

For J[y]=abL(x,y,y)dxJ[y] = \int_a^b L(x,y,y')\,dx, this becomes

δ2J[y;η]=ab[Lyyη2+2Lyyηη+Lyyη2]dx.\delta^2 J[y;\eta] = \int_a^b \left[L_{yy}\eta^2 + 2L_{yy'}\eta\eta' + L_{y'y'}\eta'^2\right]dx.

🔷 Theorem 3 (Legendre's Necessary Condition)

If yy is a local minimizer of JJ, then

Lyy(x,y(x),y(x))0for all x[a,b].L_{y'y'}(x, y(x), y'(x)) \geq 0 \quad \text{for all } x \in [a,b].

Proof. If yy is a minimizer, then δ2J[y;η]0\delta^2 J[y;\eta] \geq 0 for all admissible η\eta. Take η\eta to be a tall, narrow bump centered at x0x_0. In the limit, the η2\eta'^2 term dominates, and we need Lyy(x0)0L_{y'y'}(x_0) \geq 0. Formally: choose ηn(x)=nϕ(n(xx0))\eta_n(x) = n\phi(n(x-x_0)) where ϕ\phi is a standard mollifier. Then ηn=n2ϕ(n(xx0))\eta_n' = n^2\phi'(n(x-x_0)), and

δ2JLyy(x0)ηn2dx=n3Lyy(x0)ϕ2+O(n2).\delta^2 J \approx \int L_{y'y'}(x_0) \eta_n'^2 \, dx = n^3 L_{y'y'}(x_0)\int \phi'^2 + O(n^2).

For δ2J0\delta^2 J \geq 0 as nn \to \infty, we need Lyy(x0)0L_{y'y'}(x_0) \geq 0. \square

📐 Definition 4 (Conjugate Points and the Jacobi Equation)

A point c(a,b)c \in (a,b) is conjugate to aa along the extremal yy if there exists a non-trivial solution hh of the Jacobi equation

ddx[Lyyh][LyyddxLyy]h=0\frac{d}{dx}\left[L_{y'y'} h'\right] - \left[L_{yy} - \frac{d}{dx}L_{yy'}\right]h = 0

satisfying h(a)=h(c)=0h(a) = h(c) = 0. Conjugate points are where nearby extremals refocus.

🔷 Theorem 4 (Jacobi's Sufficient Condition)

If yy is an extremal of JJ with Lyy>0L_{y'y'} > 0 on [a,b][a,b] (strengthened Legendre), and there are no conjugate points in (a,b](a,b], then yy is a (strict) local minimum.

Proof outline. The absence of conjugate points ensures that the Jacobi equation has no non-trivial solutions vanishing at both endpoints. This means the quadratic form δ2J[y;η]\delta^2 J[y;\eta] is positive definite on the space of admissible variations, which implies yy is a local minimum. The full proof uses the theory of fields of extremals. \square

📝 Example 13 (Second Variation of the Energy Functional)

For E[y]=0112y2dxE[y] = \int_0^1 \frac{1}{2}y'^2 \, dx: Lyy=1>0L_{y'y'} = 1 > 0, Lyy=0L_{yy} = 0, Lyy=0L_{yy'} = 0. So δ2E[y;η]=01η2dx0\delta^2 E[y;\eta] = \int_0^1 \eta'^2 \, dx \geq 0, with equality only if η0\eta' \equiv 0, i.e., η0\eta \equiv 0 (since η(0)=η(1)=0\eta(0)=\eta(1)=0). The Jacobi equation is h=0h'' = 0 with h(0)=0h(0)=0, giving h(x)=cxh(x) = cx. This has no zero in (0,1](0,1], so there are no conjugate points. The straight line is a strict minimum.

💡 Remark 5 (The Second Variation as a Function-Space Hessian)

The second variation is the function-space analogue of the Hessian matrix. In Rn\mathbb{R}^n, a critical point is a minimum if the Hessian is positive definite. In function space, a critical point is a minimum if δ2J\delta^2 J is positive definite on the space of admissible perturbations. This connects the calculus of variations to the spectral theory of differential operators — the Jacobi equation defines a self-adjoint operator whose eigenvalues determine whether δ2J\delta^2 J is positive definite. This closes the obligation planted in Topic 29 about the variational characterization of extrema in infinite dimensions.

Positive definite second variation (minimum) vs. indefinite second variation (saddle point).

7. Sobolev Spaces

Classical solutions (smooth functions satisfying the Euler-Lagrange equation pointwise) do not always exist. The natural function spaces for variational problems are Sobolev spaces — spaces of functions with weak derivatives in L2L^2.

📐 Definition 5 (Weak Derivative)

A function uL2(Ω)u \in L^2(\Omega) has a weak derivative uL2(Ω)u' \in L^2(\Omega) if

Ωuφdx=Ωuφdxfor all φC0(Ω).\int_\Omega u\,\varphi'\,dx = -\int_\Omega u'\varphi\,dx \quad \text{for all } \varphi \in C_0^\infty(\Omega).

This is the integration-by-parts formula with no boundary term (since φ\varphi vanishes at the boundary). The weak derivative agrees with the classical derivative when uu is classically differentiable, but extends to a much larger class of functions.

📐 Definition 6 (Sobolev Space H¹(Ω) and H¹₀(Ω))

The Sobolev space H1(Ω)H^1(\Omega) consists of all L2(Ω)L^2(\Omega) functions with weak derivatives in L2(Ω)L^2(\Omega):

H1(Ω)={uL2(Ω):uL2(Ω)},H^1(\Omega) = \{u \in L^2(\Omega) : u' \in L^2(\Omega)\},

equipped with the inner product u,vH1=Ω(uv+uv)dx\langle u, v \rangle_{H^1} = \int_\Omega (uv + u'v')\,dx.

The subspace H01(Ω)H1(Ω)H^1_0(\Omega) \subset H^1(\Omega) consists of functions that vanish on the boundary Ω\partial\Omega (in the trace sense). This is the natural space for Dirichlet boundary conditions.

🔷 Theorem 5 (H¹₀(Ω) Is a Hilbert Space)

H01(Ω)H^1_0(\Omega) with the H1H^1 inner product is a Hilbert space (a complete inner product space).

Proof. We need to show completeness. Let {un}\{u_n\} be a Cauchy sequence in H01(Ω)H^1_0(\Omega). Then {un}\{u_n\} and {un}\{u_n'\} are both Cauchy in L2(Ω)L^2(\Omega). Since L2L^2 is complete, there exist u,vL2(Ω)u, v \in L^2(\Omega) with unuu_n \to u and unvu_n' \to v in L2L^2. We verify vv is the weak derivative of uu: for any φC0\varphi \in C_0^\infty,

uφdx=limnunφdx=limnunφdx=vφdx,\int u\varphi'\,dx = \lim_{n\to\infty}\int u_n\varphi'\,dx = -\lim_{n\to\infty}\int u_n'\varphi\,dx = -\int v\varphi\,dx,

so u=vu' = v in the weak sense. Hence uH1(Ω)u \in H^1(\Omega). Since unH01u_n \in H^1_0, the boundary values (traces) converge to zero, so uH01u \in H^1_0. Thus unuu_n \to u in H01H^1_0. \square

🔷 Theorem 6 (Poincaré Inequality)

For bounded ΩRn\Omega \subset \mathbb{R}^n, there exists C>0C > 0 such that

uL2(Ω)CuL2(Ω)for all uH01(Ω).\|u\|_{L^2(\Omega)} \leq C\|u'\|_{L^2(\Omega)} \quad \text{for all } u \in H^1_0(\Omega).

Proof (for Ω=(0,L)\Omega = (0,L)). By the fundamental theorem of calculus and Cauchy-Schwarz:

u(x)2=0xu(t)dt2x0xu(t)2dtL0Lu2dt.|u(x)|^2 = \left|\int_0^x u'(t)\,dt\right|^2 \leq x\int_0^x |u'(t)|^2\,dt \leq L\int_0^L |u'|^2\,dt.

Integrating over x(0,L)x \in (0,L): 0Lu2dxL20Lu2dx\int_0^L |u|^2\,dx \leq L^2\int_0^L |u'|^2\,dx. So C=LC = L. \square

The Poincaré inequality says that on H01H^1_0, the H1H^1 seminorm uL2\|u'\|_{L^2} is equivalent to the full H1H^1 norm. This is crucial for the coercivity arguments in the direct method.

🔷 Theorem 7 (Rellich-Kondrachov Compactness Theorem)

For bounded ΩRn\Omega \subset \mathbb{R}^n with Lipschitz boundary, the embedding H1(Ω)L2(Ω)H^1(\Omega) \hookrightarrow L^2(\Omega) is compact: every bounded sequence in H1(Ω)H^1(\Omega) has a subsequence converging in L2(Ω)L^2(\Omega).

Proof deferred. The full proof requires approximation theory and the Arzelà-Ascoli theorem (cf. Topic 29). We state this result because it is essential for the direct method: it converts weak convergence in H1H^1 to strong convergence in L2L^2.

📝 Example 14 (Functions in H¹ That Are Not in C¹)

The tent function u(x)=min(x,1x)u(x) = \min(x, 1-x) on [0,1][0,1] has a corner at x=1/2x = 1/2 and is not classically differentiable there. But it has a weak derivative: u(x)=+1u'(x) = +1 on (0,1/2)(0, 1/2) and u(x)=1u'(x) = -1 on (1/2,1)(1/2, 1). Since both uu and uu' are in L2L^2, we have uH1(0,1)u \in H^1(0,1). Its squared H1H^1 norm is uH12=01u2+(u)2dx=1/12+1=13/121.083\|u\|_{H^1}^2 = \int_0^1 u^2 + (u')^2 \, dx = 1/12 + 1 = 13/12 \approx 1.083, so uH11.04\|u\|_{H^1} \approx 1.04.

In contrast, u(x)=x1/3u(x) = x^{-1/3} on (0,1](0,1] has u=13x4/3u' = -\frac{1}{3}x^{-4/3}, and 01(u)2dx=1901x8/3dx=\int_0^1 (u')^2 \, dx = \frac{1}{9}\int_0^1 x^{-8/3}\,dx = \infty. So uH1u \notin H^1.

💡 Remark 6 (Sobolev Spaces as Hilbert Spaces)

The key point: H01(Ω)H^1_0(\Omega) with the inner product u,vH1=(uv+uv)dx\langle u, v\rangle_{H^1} = \int (uv + u'v')\,dx is a Hilbert space (Theorem 5). This means we can apply the entire Hilbert-space toolkit from Topic 31 — projection theorem, Riesz representation, spectral theory — to variational problems posed on Sobolev spaces. This closes the obligation from Topic 31 and Topic 23 to explain why Sobolev spaces are the right function spaces for PDEs.

Smooth function, H¹ function with a corner, and a function not in H¹.

8. The Direct Method

The direct method is the crown jewel of the modern calculus of variations. It answers the existence question: does a minimizer exist? Where the Euler-Lagrange equation finds candidates, the direct method proves that a minimizer is actually achieved.

📐 Definition 7 (Coercivity and Weak Lower Semicontinuity)

A functional J:HRJ: H \to \mathbb{R} on a Hilbert space HH is:

  • Coercive if J[u]+J[u] \to +\infty as uH\|u\|_H \to \infty.
  • Weakly lower semicontinuous (w.l.s.c.) if unuu_n \rightharpoonup u (weak convergence) implies J[u]lim infnJ[un]J[u] \leq \liminf_{n\to\infty} J[u_n].

🔷 Theorem 8 (The Direct Method of the Calculus of Variations)

Let HH be a reflexive Banach space (e.g., a Hilbert space) and J:HR{+}J: H \to \mathbb{R} \cup \{+\infty\} be coercive and weakly lower semicontinuous. Then JJ attains its infimum: there exists uHu^* \in H with J[u]=infuHJ[u]J[u^*] = \inf_{u \in H} J[u].

Proof. We proceed in four steps.

Step 1: Bounded minimizing sequence. Let m=infJm = \inf J and choose a minimizing sequence {un}\{u_n\} with J[un]mJ[u_n] \to m. By coercivity, un\|u_n\| is bounded: if un\|u_n\| \to \infty, then J[un]J[u_n] \to \infty, contradicting J[un]m<J[u_n] \to m < \infty.

Step 2: Weak compactness. Since HH is reflexive and {un}\{u_n\} is bounded, by the Eberlein-Šmulian theorem (the sequential version of the Banach-Alaoglu theorem for reflexive spaces), there exists a subsequence {unk}\{u_{n_k}\} and uHu^* \in H with unkuu_{n_k} \rightharpoonup u^* (weak convergence).

Step 3: Lower semicontinuity. Since JJ is weakly lower semicontinuous:

J[u]lim infkJ[unk]=limkJ[unk]=m.J[u^*] \leq \liminf_{k\to\infty} J[u_{n_k}] = \lim_{k\to\infty} J[u_{n_k}] = m.

Step 4: Conclusion. By definition, m=infJJ[u]m = \inf J \leq J[u^*]. Combined with Step 3, J[u]=mJ[u^*] = m. The infimum is attained. \square

n = 0
J[y_n] = 0.5000
inf J = 0.5000
Gap = -0.0000

The explorer shows the direct method in action: a minimizing sequence {yn}\{y_n\} converges to the minimizer yy^*, with J[yn]J[y_n] descending to infJ\inf J.

📝 Example 15 (Existence of Minimizer for the Dirichlet Energy)

The Dirichlet energy E[u]=Ω12u2dxE[u] = \int_\Omega \frac{1}{2}|\nabla u|^2\,dx on H01(Ω)H^1_0(\Omega) satisfies:

  • Coercivity: By Poincaré, E[u]cuH012E[u] \geq c\|u\|_{H^1_0}^2.
  • Weak lower semicontinuity: The norm is w.l.s.c., and E[u]=12u2E[u] = \frac{1}{2}\|u\|^2 (using the equivalent norm from Poincaré).

By the direct method, a minimizer exists. This is the variational proof of existence for the Poisson equation Δu=0-\Delta u = 0 with Dirichlet boundary conditions.

📝 Example 16 (Best Approximation as a Variational Problem)

Given fL2(Ω)f \in L^2(\Omega) and a closed subspace VL2(Ω)V \subset L^2(\Omega), the best approximation problem is: minimize J[v]=fvL22J[v] = \|f - v\|_{L^2}^2 over vVv \in V. This is a quadratic functional on a Hilbert space — coercive and w.l.s.c. The direct method guarantees existence, and the minimizer is the orthogonal projection PVfP_V f (cf. Topic 31 and Topic 15).

💡 Remark 7 (The Direct Method as the Infinite-Dimensional Extreme Value Theorem)

In finite dimensions, the extreme value theorem says: a continuous function on a compact set attains its extrema. The direct method is the infinite-dimensional version. Coercivity provides the “compact set” (bounded sublevel sets), weak lower semicontinuity provides “continuity” (in the weak topology), and reflexivity provides “compactness” (Eberlein-Šmulian). This closes the staircase from the Bolzano-Weierstrass theorem in Topic 29 to the direct method here.

Minimizing sequence converging: curves y_n, functional values J[y_n] descending, weak limit as minimizer.

9. Weak Solutions and the Lax-Milgram Theorem

The Lax-Milgram theorem is the bridge between variational problems and PDEs: it converts a variational problem into a well-posed operator equation.

📐 Definition 8 (Weak Solution of a PDE)

A function uH01(Ω)u \in H^1_0(\Omega) is a weak solution of the boundary value problem Δu=f-\Delta u = f (with uΩ=0u|_{\partial\Omega} = 0) if

Ωuvdx=Ωfvdxfor all vH01(Ω).\int_\Omega \nabla u \cdot \nabla v \, dx = \int_\Omega fv \, dx \quad \text{for all } v \in H^1_0(\Omega).

This is obtained by multiplying the PDE by a test function vv, integrating by parts, and dropping the boundary term (since vH01v \in H^1_0).

📐 Definition 9 (Bilinear Form — Coercivity and Boundedness)

A bilinear form a:H×HRa: H \times H \to \mathbb{R} on a Hilbert space HH is:

  • Bounded (continuous) if a(u,v)Muv|a(u,v)| \leq M\|u\|\|v\| for some M>0M > 0.
  • Coercive if a(u,u)αu2a(u,u) \geq \alpha\|u\|^2 for some α>0\alpha > 0.

🔷 Theorem 9 (Lax-Milgram Theorem)

Let HH be a Hilbert space, a:H×HRa: H \times H \to \mathbb{R} a bounded, coercive bilinear form, and FHF \in H^* a bounded linear functional. Then there exists a unique uHu \in H such that

a(u,v)=F(v)for all vH.a(u, v) = F(v) \quad \text{for all } v \in H.

Proof. The key is to use the Riesz representation theorem from Topic 31.

Step 1: Define the operator AA. For each fixed uHu \in H, the map va(u,v)v \mapsto a(u,v) is a bounded linear functional on HH. By the Riesz representation theorem, there exists a unique AuHAu \in H such that a(u,v)=Au,va(u,v) = \langle Au, v\rangle for all vv. The map uAuu \mapsto Au is linear and bounded with AM\|A\| \leq M.

Step 2: Show AA is bijective. Coercivity gives Au,u=a(u,u)αu2\langle Au, u\rangle = a(u,u) \geq \alpha\|u\|^2, so:

AuuAu,uαu2    Auαu.\|Au\|\|u\| \geq \langle Au, u\rangle \geq \alpha\|u\|^2 \implies \|Au\| \geq \alpha\|u\|.

This means AA is injective and has closed range.

Step 3: Show the range is dense. If wRange(A)w \perp \text{Range}(A), then Aw,w=0\langle Aw, w\rangle = 0, but coercivity gives αw2Aw,w=0\alpha\|w\|^2 \leq \langle Aw, w\rangle = 0, so w=0w = 0. Hence Range(A)={0}\text{Range}(A)^\perp = \{0\}, meaning Range(A)\text{Range}(A) is dense in HH.

Step 4: Conclude. Since Range(A)\text{Range}(A) is both closed and dense, AA is surjective. Hence AA is bijective with bounded inverse (A11/α\|A^{-1}\| \leq 1/\alpha).

Now apply Riesz again: FHF \in H^* corresponds to some fHf \in H with F(v)=f,vF(v) = \langle f, v\rangle. Set u=A1fu = A^{-1}f. Then a(u,v)=Au,v=f,v=F(v)a(u,v) = \langle Au, v\rangle = \langle f, v\rangle = F(v) for all vv. \square

📝 Example 17 (Weak Solution of Poisson's Equation)

For Δu=f-\Delta u = f on Ω\Omega with uΩ=0u|_{\partial\Omega} = 0: take H=H01(Ω)H = H^1_0(\Omega), a(u,v)=uvdxa(u,v) = \int \nabla u \cdot \nabla v\, dx, F(v)=fvdxF(v) = \int fv\,dx. By Poincaré, aa is coercive: a(u,u)=u2CP2uH012a(u,u) = \int |\nabla u|^2 \geq C_P^{-2}\|u\|_{H^1_0}^2. By Cauchy-Schwarz, aa is bounded. Lax-Milgram gives the unique weak solution uH01u \in H^1_0.

📝 Example 18 (Weak Solution of the Sturm-Liouville Problem)

For (pu)+qu=f-(pu')' + qu = f on (a,b)(a,b) with u(a)=u(b)=0u(a)=u(b)=0: take a(u,v)=ab(puv+quv)dxa(u,v) = \int_a^b (pu'v' + quv)\,dx. If pp0>0p \geq p_0 > 0 and q0q \geq 0, then aa is coercive and bounded. Lax-Milgram applies.

💡 Remark 8 (From Lax-Milgram to Finite Elements)

The finite element method discretizes the Lax-Milgram problem: replace HH by a finite-dimensional subspace VhV_h (e.g., piecewise-linear functions on a mesh), and solve a(uh,vh)=F(vh)a(u_h, v_h) = F(v_h) for all vhVhv_h \in V_h. This is a finite linear system Ku=fK\mathbf{u} = \mathbf{f} where Kij=a(ϕj,ϕi)K_{ij} = a(\phi_j, \phi_i) is the stiffness matrix. The Lax-Milgram theorem guarantees that the stiffness matrix is invertible (because aa is coercive on VhV_h too).

Bilinear form geometry and Riesz representation converting to operator equation.

10. Eigenvalue Problems and the Rayleigh Quotient

The variational characterization of eigenvalues connects the spectral theory from Topic 31 to the calculus of variations.

📐 Definition 10 (Sturm-Liouville Eigenvalue Problem)

The Sturm-Liouville eigenvalue problem is: find non-trivial uH01(0,π)u \in H^1_0(0,\pi) and λR\lambda \in \mathbb{R} such that

u=λuon (0,π),u(0)=u(π)=0.-u'' = \lambda u \quad \text{on } (0,\pi), \quad u(0) = u(\pi) = 0.

The eigenvalues are λn=n2\lambda_n = n^2 (n=1,2,3,n = 1, 2, 3, \ldots) with eigenfunctions un(x)=sin(nx)u_n(x) = \sin(nx).

🔷 Theorem 10 (Variational Characterization of the First Eigenvalue)

The first eigenvalue of u=λu-u'' = \lambda u on (0,π)(0,\pi) with Dirichlet conditions is

λ1=minuH01{0}R[u],where R[u]=0πu2dx0πu2dx\lambda_1 = \min_{u \in H^1_0 \setminus \{0\}} R[u], \quad \text{where } R[u] = \frac{\int_0^\pi u'^2\,dx}{\int_0^\pi u^2\,dx}

is the Rayleigh quotient. The minimum is achieved by u1=sin(x)u_1 = \sin(x) with λ1=1\lambda_1 = 1.

Proof. We use the weak formulation. Multiply u=λu-u'' = \lambda u by vH01v \in H^1_0 and integrate by parts:

0πuvdx=λ0πuvdx.\int_0^\pi u'v'\,dx = \lambda\int_0^\pi uv\,dx.

Setting v=uv = u: u2=λu2\int u'^2 = \lambda\int u^2, so λ=R[u]\lambda = R[u] for any eigenfunction.

Now we show λ1\lambda_1 is the minimum of RR. Expand uu in the eigenfunction basis: u=cnsin(nx)u = \sum c_n\sin(nx). By Parseval’s identity:

0πu2=π2cn2,0πu2=π2n2cn2.\int_0^\pi u^2 = \frac{\pi}{2}\sum c_n^2, \quad \int_0^\pi u'^2 = \frac{\pi}{2}\sum n^2 c_n^2.

Therefore R[u]=n2cn2cn212cn2cn2=1=λ1R[u] = \frac{\sum n^2 c_n^2}{\sum c_n^2} \geq \frac{1^2 \sum c_n^2}{\sum c_n^2} = 1 = \lambda_1, with equality when cn=0c_n = 0 for n2n \geq 2, i.e., u=c1sin(x)u = c_1\sin(x). \square

🔷 Theorem 11 (Min-Max Principle (Courant-Fischer))

The kk-th eigenvalue is

λk=minVH01dimV=kmaxuV{0}R[u].\lambda_k = \min_{\substack{V \subset H^1_0 \\ \dim V = k}} \max_{u \in V \setminus \{0\}} R[u].

Proof outline. This follows from the spectral theorem for compact self-adjoint operators applied to the inverse of d2/dx2-d^2/dx^2 (cf. Topic 31). The min-max characterization is equivalent to the variational principle: λk\lambda_k is the minimum of R[u]R[u] over the orthogonal complement of the first k1k-1 eigenfunctions. \square

R[u] = 1.0000
λ_1 = 1
Error = 0.00%
R[u] ≥ λ₁ = Yes

Adjust the trial function coefficients to explore the Rayleigh quotient. The minimum value R[u]=λ1=1R[u] = \lambda_1 = 1 is achieved when uu is a multiple of sin(x)\sin(x). Any trial function yields R[u]λ1R[u] \geq \lambda_1 — this is the variational characterization in action.

📝 Example 19 (Eigenvalues of −u'' = λu on [0,π])

The eigenfunctions un(x)=sin(nx)u_n(x) = \sin(nx) have eigenvalues λn=n2\lambda_n = n^2. For the trial function u(x)=sin(x)u(x) = \sin(x):

R[sinx]=0πcos2xdx0πsin2xdx=π/2π/2=1=λ1.R[\sin x] = \frac{\int_0^\pi \cos^2 x \, dx}{\int_0^\pi \sin^2 x \, dx} = \frac{\pi/2}{\pi/2} = 1 = \lambda_1. \quad \checkmark

💡 Remark 9 (Spectral Theory Connection)

The variational characterization of eigenvalues is the bridge between the spectral theorem (Topic 31) and the calculus of variations. The compact self-adjoint operator T=(d2/dx2)1T = (-d^2/dx^2)^{-1} on L2(0,π)L^2(0,\pi) has eigenvalues μn=1/n2\mu_n = 1/n^2 and eigenfunctions sin(nx)\sin(nx). The spectral theorem decomposes TT as Tf=1n2f,ununTf = \sum \frac{1}{n^2}\langle f, u_n\rangle u_n. The Rayleigh quotient R[u]=u,(d2/dx2)u/u,uR[u] = \langle u, (-d^2/dx^2)u\rangle / \langle u, u\rangle is the inverse: R=1/μR = 1/\mu. Minimizing RR is equivalent to maximizing μ\mu — finding the largest eigenvalue of TT.

First three eigenmodes, Rayleigh quotient values, and variational convergence.

11. Connections to ML

The calculus of variations is not just classical mathematics — it is the mathematical language of modern machine learning at its most fundamental.

📝 Example 20 (Physics-Informed Neural Networks (PINNs))

A PINN parameterizes the solution of a PDE Lu=f\mathcal{L}u = f as a neural network uθu_\theta and minimizes the variational loss

J[uθ]=ΩLuθf2dx+λΩuθg2dS.J[u_\theta] = \int_\Omega |\mathcal{L}u_\theta - f|^2\,dx + \lambda\int_{\partial\Omega}|u_\theta - g|^2\,dS.

This is literally a calculus-of-variations problem: JJ is a functional on the space of network functions. The Euler-Lagrange equation for JJ recovers the PDE — the PINN finds an approximate solution by minimizing the variational residual. The direct method guarantees that a minimizer exists in H01H^1_0 (the Sobolev space), and the neural network approximates it.

📝 Example 21 (Optimal Transport)

The Monge-Kantorovich problem seeks the transport map TT minimizing the total cost c(x,T(x))dμ(x)\int c(x, T(x))\,d\mu(x). In its relaxed (Kantorovich) formulation, we minimize over transport plans γ\gamma:

Wpp(μ,ν)=infγΠ(μ,ν)xypdγ(x,y).W_p^p(\mu,\nu) = \inf_{\gamma \in \Pi(\mu,\nu)} \int |x - y|^p\,d\gamma(x,y).

The Wasserstein distance WpW_p metrizes the space of probability measures. This is a variational problem over a function space — the direct method applies because the set of transport plans is weakly compact and the cost functional is lower semicontinuous.

📝 Example 22 (Variational Autoencoders (VAEs))

A VAE maximizes the evidence lower bound (ELBO):

ELBO(θ,ϕ)=Eqϕ(zx)[logpθ(xz)]DKL(qϕ(zx)p(z)).\text{ELBO}(\theta, \phi) = \mathbb{E}_{q_\phi(z|x)}[\log p_\theta(x|z)] - D_{\text{KL}}(q_\phi(z|x) \| p(z)).

This is a variational objective over the encoder qϕq_\phi and decoder pθp_\theta. The name “variational” is literal: we are optimizing a functional over a family of distributions. The ELBO is related to the calculus of variations through the Euler-Lagrange equation for the optimal encoder, which yields the posterior p(zx)p(z|x).

📝 Example 23 (Diffusion Models)

Score-matching diffusion models minimize

J[sθ]=Et,xt[sθ(xt,t)logpt(xt)2],J[\mathbf{s}_\theta] = \mathbb{E}_{t,\mathbf{x}_t}\left[\|\mathbf{s}_\theta(\mathbf{x}_t, t) - \nabla \log p_t(\mathbf{x}_t)\|^2\right],

where sθ\mathbf{s}_\theta is the score network. This is a variational problem: minimize a functional over the space of score functions. The optimal score logpt\nabla\log p_t is the Euler-Lagrange solution. The connection to stochastic calculus (the reverse-time SDE) adds a layer of calculus-of-variations structure.

💡 Remark 10 (The Variational Principle in ML)

Across PINNs, optimal transport, VAEs, and diffusion models, the pattern is the same: define a functional on a function space, and minimize it. The calculus of variations provides the theoretical foundation — existence of minimizers (direct method), necessary conditions (Euler-Lagrange), sufficiency (second variation), and the functional-analytic setting (Sobolev spaces, Hilbert spaces). Understanding this machinery is not optional for serious ML research; it is the language in which the theory is written.

Four ML applications: PINNs, optimal transport, VAEs, and diffusion models.

12. Computational Notes

The Euler-Lagrange equation can be solved numerically via finite differences or finite elements. Here is a minimal example for the Dirichlet energy minimization u=f-u'' = f on [0,1][0,1] with u(0)=u(1)=0u(0) = u(1) = 0:

import numpy as np

# Finite difference discretization of -u'' = f on [0,1]
n = 100
h = 1.0 / (n + 1)
x = np.linspace(h, 1 - h, n)

# Stiffness matrix (tridiagonal)
A = (2 * np.eye(n) - np.eye(n, k=1) - np.eye(n, k=-1)) / h**2

# Right-hand side: f(x) = sin(pi*x)
f = np.sin(np.pi * x)

# Solve Au = f
u = np.linalg.solve(A, f)

# Exact solution: u(x) = sin(pi*x) / pi^2
u_exact = np.sin(np.pi * x) / np.pi**2
print(f"Max error: {np.max(np.abs(u - u_exact)):.2e}")

For the Rayleigh quotient iteration (finding eigenvalues variationally):

# Power method for the smallest eigenvalue of -u''
u = np.ones(n) / np.sqrt(n)  # initial guess
for _ in range(50):
    v = np.linalg.solve(A, u)       # apply A^{-1}
    lam = np.dot(u, v)              # Rayleigh quotient
    u = v / np.linalg.norm(v)       # normalize
print(f"λ₁ ≈ {1/lam:.6f} (exact: {np.pi**2:.6f})")

13. Connections and Further Reading

Within formalCalculus

TopicConnection
Metric SpacesCompleteness and compactness reappear in the direct method
Banach SpacesSobolev spaces are Banach spaces; reflexivity enables weak compactness
Hilbert SpacesRiesz representation → Lax-Milgram; projection → best approximation
Mean Value & TaylorFirst and second variation are Taylor expansions in function space
Line IntegralsGeodesics as extremals of the arc-length functional
Surface IntegralsEuler-Lagrange on domains uses the divergence theorem
Approximation TheoryBest approximation is a variational problem

Forward to formalML

The calculus of variations feeds directly into four areas of modern ML theory:

  • Lagrangian Duality — Euler-Lagrange as the prototype for optimality conditions with constraints.
  • Information Geometry — Geodesics on statistical manifolds are calculus-of-variations problems.
  • Gradient Descent — The direct method provides the existence theory; gradient flow is continuous-time variation.
  • Generative Modeling — VAEs and diffusion models minimize variational objectives.

Track 8 Summary

The Functional Analysis Essentials track progressed through four levels of abstraction:

  1. Metric spaces — distance, completeness, compactness.
  2. Banach spaces — norms, bounded operators, the big four theorems.
  3. Hilbert spaces — inner products, projection, Riesz, spectral theory, RKHS.
  4. Calculus of variations — functionals, Euler-Lagrange, Sobolev spaces, direct method, Lax-Milgram.

Each level added one axiom and gained enormous power. The staircase is now complete.


14. Summary

ElementStatement
Def. 1Functional: a map J:ARJ: \mathcal{A} \to \mathbb{R} from a function space
Def. 2First variation: δJ[y;η]\delta J[y;\eta], the directional derivative of JJ at yy in direction η\eta
Def. 3Second variation: δ2J[y;η]\delta^2 J[y;\eta], the second-order directional derivative
Def. 4Conjugate points and the Jacobi equation
Def. 5Weak derivative via integration by parts
Def. 6Sobolev spaces H1(Ω)H^1(\Omega) and H01(Ω)H^1_0(\Omega)
Def. 7Coercivity and weak lower semicontinuity
Def. 8Weak solution of a PDE
Def. 9Bilinear form — coercivity and boundedness
Def. 10Sturm-Liouville eigenvalue problem
Thm. 1Fundamental lemma of the calculus of variations
Thm. 2The Euler-Lagrange equation
Thm. 3Legendre’s necessary condition: Lyy0L_{y'y'} \geq 0
Thm. 4Jacobi’s sufficient condition: no conjugate points → minimum
Thm. 5H01(Ω)H^1_0(\Omega) is a Hilbert space
Thm. 6Poincaré inequality
Thm. 7Rellich-Kondrachov compactness (stated)
Thm. 8The direct method: coercive + w.l.s.c. → minimum attained
Thm. 9Lax-Milgram theorem
Thm. 10Variational characterization of eigenvalues
Thm. 11Min-max principle (Courant-Fischer)

15. Closing Reflection

We have reached the summit.

Topic 32 is the 32nd and final topic on formalCalculus — the last node in a directed graph that began with epsilon-delta definitions and ends here, with the calculus of variations. Let us take a moment to see where we have been.

The journey through single-variable calculus (Topics 1–8) built the foundations: limits, continuity, derivatives, integrals, and Taylor series — the language in which all subsequent mathematics is written. Multivariable calculus (Topics 9–14) extended this machinery to Rn\mathbb{R}^n: gradients, Jacobians, Hessians, multiple integrals, line integrals, surface integrals. Series and approximation (Topics 15–18) taught us to represent functions as infinite sums — power series, Fourier series, uniform convergence — and to quantify the quality of approximations. Ordinary differential equations (Topics 19–22) showed how derivatives drive dynamics: first-order equations, linear systems, stability theory, and numerical methods. Measure and integration (Topics 23–28) rebuilt the integral from the ground up: sigma-algebras, the Lebesgue integral, LpL^p spaces, the Radon-Nikodym theorem — replacing the Riemann integral with a theory powerful enough for modern analysis. And functional analysis (Topics 29–32, this track) assembled the abstract framework: metric spaces, normed and Banach spaces, inner-product and Hilbert spaces, and finally the calculus of variations.

At each level, the pattern was the same: add one axiom, gain an enormous amount of power. A metric gives completeness and fixed-point theorems. A norm gives bounded operators and the big four theorems. An inner product gives projection, Riesz representation, and spectral decomposition. And the variational perspective — optimizing functionals on function spaces — gives the Euler-Lagrange equation, the direct method for existence, Sobolev spaces, and the Lax-Milgram theorem.

These are not just mathematical curiosities. They are the foundations of modern machine learning. Every gradient descent step invokes the calculus. Every loss function is a functional. Every regularized objective lives in a Sobolev space. Every kernel method operates in a reproducing kernel Hilbert space. Every PINN solves a variational problem. Every diffusion model minimizes a score-matching loss that is a functional over function spaces.

The reader who has worked through all 32 topics now has the rigorous calculus and analysis machinery that formalML assumes. The path forward is clear: Lagrangian duality, information geometry, optimization theory, spectral methods, generative modeling. The foundations are laid. The mathematics is yours.

References

  1. book Dacorogna (2015). Introduction to the Calculus of Variations Primary reference for the direct method and Sobolev spaces
  2. book Gelfand & Fomin (1963). Calculus of Variations Classical reference for Euler-Lagrange derivation and classical examples
  3. book Brezis (2011). Functional Analysis, Sobolev Spaces and Partial Differential Equations Sobolev spaces and Lax-Milgram theorem
  4. book Evans (2010). Partial Differential Equations Weak solutions, Sobolev embedding theorems
  5. paper Raissi, Perdikaris & Karniadakis (2019). “Physics-Informed Neural Networks” Foundational PINN paper