Multivariable Integral · intermediate · 50 min read

Change of Variables & the Jacobian Determinant

Transforming integrals under coordinate substitution — the Jacobian determinant as the volume scaling factor, polar, cylindrical, and spherical coordinates, the Gaussian integral, and density transformations in normalizing flows

Abstract. The change of variables formula transforms a multiple integral from one coordinate system to another: if φ: D* → D is a C¹ diffeomorphism, then ∫∫_D f(x,y) dA = ∫∫_{D*} f(φ(u,v)) |det J_φ(u,v)| du dv. The Jacobian determinant |det J_φ| measures how φ distorts area elements — it is the volume scaling factor that was introduced abstractly in the Jacobian topic and now does real computational work. Polar coordinates are the canonical example: the map (r,θ) → (r cos θ, r sin θ) has Jacobian determinant r, producing the area element r dr dθ. This immediately resolves the painful disk integral from Topic 13 and enables the classic proof that the Gaussian integral equals √π — a result that propagates throughout probability and statistical mechanics. Cylindrical and spherical coordinates extend the framework to three dimensions. The general theorem, proved via the Inverse Function Theorem and a partition of unity argument, guarantees that integration is coordinate-independent for any C¹ diffeomorphism. In machine learning, the change of variables formula is the mathematical engine of normalizing flows: the density of a transformed variable X = f(Z) satisfies p_X(x) = p_Z(f⁻¹(x)) · |det J_{f⁻¹}(x)|, and the entire architecture of flow-based generative models is designed to make this Jacobian determinant tractable. The reparameterization trick in variational autoencoders is the same formula applied to move gradient computation outside the expectation.

Where this leads → formalML

  • formalML The reparameterization trick in variational inference — writing z = μ + σε with ε ~ N(0,1) — is a change of variables that moves parameters outside the expectation, enabling gradient computation through sampling.
  • formalML The density transformation formula p_X(x) = p_Z(φ⁻¹(x)) · |det J_{φ⁻¹}(x)| is the probabilistic form of the change of variables theorem.
  • formalML Integration on manifolds requires coordinate charts, and the change of variables formula ensures that integrals are well-defined independent of chart choice.
  • formalML The Fisher information matrix transforms under reparameterization via Ĩ(θ̃) = JᵀI(θ)J, where J is the Jacobian of the parameter change.

1. Overview & Motivation

You’ve built a generative model — a neural network that maps simple noise zN(0,I)z \sim \mathcal{N}(0, I) to complex data x=f(z)x = f(z). To train it, you need the density pX(x)p_X(x). The change of variables formula gives it:

pX(x)=pZ(f1(x))detJf1(x).p_X(x) = p_Z(f^{-1}(x)) \cdot |\det J_{f^{-1}}(x)|.

The entire architecture of normalizing flows is designed to make the right side of this equation tractable. And the formula itself — “adjust the density by the Jacobian determinant” — is exactly the integration change of variables from multivariable calculus, applied to probability.

This topic brings together the Jacobian determinant (Topic 10), the Inverse Function Theorem (Topic 12), and multiple integrals (Topic 13) into a single powerful formula. The reader already has all the ingredients; this topic assembles them.

2. Substitution in One Variable — The Template

We know uu-substitution from Topic 7, but we reframe it through the lens of coordinate transformation to set up the multivariable version.

🔷 Theorem 1 (Substitution Rule (1D))

Let g:[α,β][a,b]g: [\alpha, \beta] \to [a, b] be a C1C^1 bijection with g(t)0g'(t) \neq 0 on (α,β)(\alpha, \beta). Then for any continuous f:[a,b]Rf: [a, b] \to \mathbb{R}:

abf(x)dx=αβf(g(t))g(t)dt.\int_a^b f(x)\,dx = \int_\alpha^\beta f(g(t))\,|g'(t)|\,dt.

The factor g(t)|g'(t)| adjusts for how gg stretches or compresses the interval. When gg is increasing, g(t)=g(t)|g'(t)| = g'(t); when decreasing, g(t)=g(t)|g'(t)| = -g'(t), which also reverses the limits to compensate.

💡 Remark 1 (Why the Absolute Value?)

In 1D, you can track orientation via limit ordering. In Rn\mathbb{R}^n, there are no “limits” to reverse — the absolute value of the Jacobian determinant is the only way to ensure positivity of the volume element. The absolute value is the conceptual bridge from 1D to nnD.

📝 Example 1 (Quarter-Circle Area via Substitution)

Compute 011x2dx\int_0^1 \sqrt{1 - x^2}\,dx via x=sinθx = \sin\theta. We have g(θ)=sinθg(\theta) = \sin\theta, g(θ)=cosθg'(\theta) = \cos\theta, limits [0,π/2][0, \pi/2]:

011x2dx=0π/21sin2θcosθdθ=0π/2cos2θdθ=π4.\int_0^1 \sqrt{1 - x^2}\,dx = \int_0^{\pi/2} \sqrt{1 - \sin^2\theta} \cdot \cos\theta\,d\theta = \int_0^{\pi/2} \cos^2\theta\,d\theta = \frac{\pi}{4}.

This is one quarter of the unit disk area — a preview of polar coordinates.

📝 Example 2 (Gaussian Integral via Gamma Function)

Compute 0ex2/2dx\int_0^\infty e^{-x^2/2}\,dx via u=x2/2u = x^2/2. We get x=2ux = \sqrt{2u}, dx=du/2udx = du/\sqrt{2u}:

0ex2/2dx=120u1/2eudu=12Γ(1/2)=π/2.\int_0^\infty e^{-x^2/2}\,dx = \frac{1}{\sqrt{2}} \int_0^\infty u^{-1/2} e^{-u}\,du = \frac{1}{\sqrt{2}} \,\Gamma(1/2) = \sqrt{\pi/2}.

This connects to the Gamma function (Topic 8) and previews the Gaussian integral computation later in this topic.

1D substitution: original integral with x-axis, transformed integral with u-axis

3. The Change of Variables Formula in 2D

This is where the Jacobian determinant enters integration. Draw a grid in the (u,v)(u, v)-plane. The map φ\varphi sends each small rectangle du×dvdu \times dv to a small parallelogram in the (x,y)(x, y)-plane. The area of that parallelogram is approximately detJφ(u,v)dudv|\det J_\varphi(u, v)| \cdot du\,dv — this is exactly the Volume Distortion Theorem from Topic 10. The change of variables formula says: sum up ff over the deformed parallelograms, weighting each by its area.

📐 Definition 1 (Coordinate Transformation)

A coordinate transformation (or change of variables) on an open set DR2D^* \subseteq \mathbb{R}^2 is a C1C^1 function φ:DDR2\varphi: D^* \to D \subseteq \mathbb{R}^2 that is a diffeomorphism: bijective, C1C^1, and with C1C^1 inverse φ1:DD\varphi^{-1}: D \to D^*.

💡 Remark 2 (Diffeomorphism vs. Local Diffeomorphism)

The IFT (Topic 12) guarantees a local diffeomorphism wherever detJφ0\det J_\varphi \neq 0. For the change of variables formula, we need a global diffeomorphism on DD^* (or at least injectivity, with detJφ=0\det J_\varphi = 0 only on a set of measure zero). Polar coordinates fail to be a global diffeomorphism on {r>0}\{r > 0\} because of the 2π2\pi-periodicity in θ\theta, but they are a diffeomorphism on any domain that doesn’t wrap all the way around.

🔷 Theorem 2 (Change of Variables (2D))

Let φ:DD\varphi: D^* \to D be a C1C^1 diffeomorphism between open subsets of R2\mathbb{R}^2, and let f:DRf: D \to \mathbb{R} be continuous. Then:

Df(x,y)dA=Df(φ(u,v))detJφ(u,v)dudv.\iint_D f(x, y)\,dA = \iint_{D^*} f(\varphi(u, v))\,|\det J_\varphi(u, v)|\,du\,dv.

The Jacobian determinant detJφ|\det J_\varphi| converts the area element dudvdu\,dv in the (u,v)(u,v)-coordinate system to the area element dAdA in the (x,y)(x,y)-coordinate system.

📝 Example 3 (Linear Change of Variables)

For φ(u,v)=(au+bv,cu+dv)\varphi(u, v) = (au + bv,\, cu + dv) with A=(abcd)A = \begin{pmatrix} a & b \\ c & d \end{pmatrix} and detA0\det A \neq 0: detJφ=detA\det J_\varphi = \det A (constant). The formula reduces to Df(x,y)dA=detADf(Au)dudv\iint_D f(x,y)\,dA = |\det A| \iint_{D^*} f(Au)\,du\,dv. Linear maps scale all areas by the same factor.

📝 Example 4 (Affine Shear)

For φ(u,v)=(u+v,v)\varphi(u,v) = (u + v, v): detJφ=1\det J_\varphi = 1. The integral is unchanged: DfdA=Df(u+v,v)dudv\iint_D f\,dA = \iint_{D^*} f(u+v, v)\,du\,dv. Area-preserving transformations have a unit Jacobian determinant.

💡 Remark 3 (The Formula in Reverse)

Often we start with an integral in (x,y)(x, y) and want to transform to (u,v)(u, v). If φ:DD\varphi: D^* \to D is the transformation (u,v)(x,y)(u, v) \mapsto (x, y), then we substitute x=φ1(u,v)x = \varphi_1(u,v), y=φ2(u,v)y = \varphi_2(u,v), dA=detJφdudvdA = |\det J_\varphi|\,du\,dv, and change the region from DD to D=φ1(D)D^* = \varphi^{-1}(D). The conceptual direction is: “I want simpler limits, so I find a φ\varphi that maps a simple DD^* to the complicated DD.”

Polar coordinates: (r,θ) → (r cos θ, r sin θ). The Jacobian determinant is r — cells near the origin are compressed.

|det J| range: [0.179, 1.921]

Uniform grid in (u,v) space deformed by φ into (x,y) space, cells colored by |det J|

4. Polar Coordinates

The reader has been waiting for this since Example 7 of Topic 13 (the disk integral that was “painful in Cartesian”). Polar coordinates are the prototypical 2D change of variables.

📐 Definition 2 (Polar Coordinates)

The polar coordinate transformation is φ(r,θ)=(rcosθ,rsinθ)\varphi(r, \theta) = (r\cos\theta, r\sin\theta), mapping from D={(r,θ):r>0,θ(0,2π)}D^* = \{(r, \theta) : r > 0,\, \theta \in (0, 2\pi)\} to D=R2{x0,y=0}D = \mathbb{R}^2 \setminus \{x \ge 0, y = 0\} (the plane minus the positive xx-axis). The Jacobian is:

Jφ(r,θ)=(cosθrsinθsinθrcosθ),detJφ=r.J_\varphi(r, \theta) = \begin{pmatrix} \cos\theta & -r\sin\theta \\ \sin\theta & r\cos\theta \end{pmatrix}, \qquad |\det J_\varphi| = r.

The area element is dA=rdrdθdA = r\,dr\,d\theta.

💡 Remark 4 (Why r > 0?)

At r=0r = 0, detJφ=0\det J_\varphi = 0, and φ\varphi collapses all θ\theta values to the origin — it is not injective. The origin is a single point (measure zero), so excluding it does not affect the integral. This is typical: changes of variables are allowed to fail on sets of measure zero.

📝 Example 5 (Area of the Unit Disk (Resolved))

x2+y21dA=02π01rdrdθ=2π12=π.\iint_{x^2+y^2 \le 1} dA = \int_0^{2\pi} \int_0^1 r\,dr\,d\theta = 2\pi \cdot \frac{1}{2} = \pi.

Compare with the Cartesian computation from Topic 13, Example 7: 111x21x2dydx=1121x2dx=π\int_{-1}^1 \int_{-\sqrt{1-x^2}}^{\sqrt{1-x^2}} dy\,dx = \int_{-1}^1 2\sqrt{1-x^2}\,dx = \pi. Same answer, but polar reduces the integral to a product of two elementary 1D integrals.

📝 Example 6 (Integral over an Annulus)

D(x2+y2)dA\iint_D (x^2 + y^2)\,dA over the annulus 1x2+y241 \le x^2+y^2 \le 4: in polar, x2+y2=r2x^2 + y^2 = r^2:

02π12r2rdrdθ=2πr4412=2π154=15π2.\int_0^{2\pi} \int_1^2 r^2 \cdot r\,dr\,d\theta = 2\pi \cdot \frac{r^4}{4}\Big|_1^2 = 2\pi \cdot \frac{15}{4} = \frac{15\pi}{2}.

The circular symmetry of both the region and the integrand makes polar coordinates the natural choice.

📝 Example 7 (Volume Under the Paraboloid (Resolved))

From Topic 13, §7: the volume between z=x2+y2z = x^2 + y^2 and z=4z = 4 over the disk x2+y24x^2 + y^2 \le 4. In polar:

V=02π02(4r2)rdrdθ=2π02(4rr3)dr=2π[2r2r44]02=2π(84)=8π.V = \int_0^{2\pi} \int_0^2 (4 - r^2)\,r\,dr\,d\theta = 2\pi \int_0^2 (4r - r^3)\,dr = 2\pi \left[2r^2 - \frac{r^4}{4}\right]_0^2 = 2\pi(8 - 4) = 8\pi.

Polar Riemann sum: 3.141593 | Exact: 3.141593 | Error: 4.44e-16

Cartesian grid vs. polar grid on the unit disk, area element comparison

5. The Gaussian Integral

The computation of ex2dx=π\int_{-\infty}^{\infty} e^{-x^2}\,dx = \sqrt{\pi} is one of the most important results in all of mathematics, appearing in probability (the normalizing constant of the normal distribution), statistical mechanics (the partition function), and quantum mechanics (path integrals). The proof is a triumph of the change-of-variables formula.

🔷 Proposition 1 (The Gaussian Integral)

ex2dx=π.\int_{-\infty}^{\infty} e^{-x^2}\,dx = \sqrt{\pi}.

Proof.

Step 1: Square the integral. Let I=ex2dxI = \int_{-\infty}^{\infty} e^{-x^2}\,dx. Then:

I2=(ex2dx)(ey2dy)=R2e(x2+y2)dA.I^2 = \left(\int_{-\infty}^{\infty} e^{-x^2}\,dx\right)\left(\int_{-\infty}^{\infty} e^{-y^2}\,dy\right) = \iint_{\mathbb{R}^2} e^{-(x^2+y^2)}\,dA.

The first equality uses Fubini (Topic 13, Theorem 1) to convert the product of two 1D integrals into a double integral. This requires justifying that e(x2+y2)e^{-(x^2+y^2)} is integrable over R2\mathbb{R}^2 — it suffices to note that 0R02πer2rdθdr=2π0Rrer2dr=π(1eR2)π\int_0^R \int_0^{2\pi} e^{-r^2} r\,d\theta\,dr = 2\pi \int_0^R r e^{-r^2}\,dr = \pi(1 - e^{-R^2}) \to \pi as RR \to \infty, so the improper integral converges.

Step 2: Switch to polar coordinates. Using x2+y2=r2x^2 + y^2 = r^2 and dA=rdrdθdA = r\,dr\,d\theta:

I2=02π0er2rdrdθ.I^2 = \int_0^{2\pi}\int_0^{\infty} e^{-r^2}\,r\,dr\,d\theta.

The inner integral is elementary: 0rer2dr=[12er2]0=12\int_0^{\infty} r e^{-r^2}\,dr = \left[-\frac{1}{2}e^{-r^2}\right]_0^\infty = \frac{1}{2}.

Step 3: Evaluate. I2=2π12=πI^2 = 2\pi \cdot \frac{1}{2} = \pi. Since I>0I > 0, we conclude I=πI = \sqrt{\pi}.

💡 Remark 5 (Why Is This Hard in Cartesian?)

The function ex2e^{-x^2} has no elementary antiderivative (this is a theorem, not a failure of technique). The 1D integral is genuinely intractable without the 2D “trick.” The change-of-variables formula transforms a hard 1D problem into an easy 2D one by exploiting radial symmetry.

📝 Example 8 (The Gaussian Normalizing Constant)

The PDF of N(μ,σ2)\mathcal{N}(\mu, \sigma^2) is 1σ2πe(xμ)2/(2σ2)\frac{1}{\sigma\sqrt{2\pi}} e^{-(x-\mu)^2/(2\sigma^2)}. Verify normalization using the substitution u=(xμ)/(σ2)u = (x - \mu)/(\sigma\sqrt{2}) and eu2du=π\int_{-\infty}^{\infty} e^{-u^2}\,du = \sqrt{\pi}:

1σ2πe(xμ)2/(2σ2)dx=1σ2πσ2π=1.\int_{-\infty}^{\infty} \frac{1}{\sigma\sqrt{2\pi}} e^{-(x-\mu)^2/(2\sigma^2)}\,dx = \frac{1}{\sigma\sqrt{2\pi}} \cdot \sigma\sqrt{2} \cdot \sqrt{\pi} = 1.

This is why the 2π\sqrt{2\pi} appears in the Gaussian density — it is the Gaussian integral.

I = ∫e^{-x²}dx from -∞ to ∞. We cannot find a closed-form antiderivative.
I_R = 1.772454I² ≈ 3.141593√π = 1.772454 | Error: 2.74e-12

3D Gaussian surface with polar radial rings and convergence plot

6. Cylindrical and Spherical Coordinates

The change of variables formula works in any dimension. Here we cover the two standard 3D coordinate systems.

📐 Definition 3 (Cylindrical Coordinates)

The cylindrical coordinate transformation is φ(r,θ,z)=(rcosθ,rsinθ,z)\varphi(r, \theta, z) = (r\cos\theta, r\sin\theta, z), with r>0r > 0, θ(0,2π)\theta \in (0, 2\pi), zRz \in \mathbb{R}. The Jacobian is:

Jφ=(cosθrsinθ0sinθrcosθ0001),detJφ=r.J_\varphi = \begin{pmatrix} \cos\theta & -r\sin\theta & 0 \\ \sin\theta & r\cos\theta & 0 \\ 0 & 0 & 1 \end{pmatrix}, \qquad |\det J_\varphi| = r.

The volume element is dV=rdrdθdzdV = r\,dr\,d\theta\,dz.

📐 Definition 4 (Spherical Coordinates)

The spherical coordinate transformation is φ(ρ,θ,ϕ)=(ρsinϕcosθ,ρsinϕsinθ,ρcosϕ)\varphi(\rho, \theta, \phi) = (\rho\sin\phi\cos\theta,\, \rho\sin\phi\sin\theta,\, \rho\cos\phi), with ρ>0\rho > 0, θ(0,2π)\theta \in (0, 2\pi), ϕ(0,π)\phi \in (0, \pi). The Jacobian matrix is:

Jφ=(sinϕcosθρsinϕsinθρcosϕcosθsinϕsinθρsinϕcosθρcosϕsinθcosϕ0ρsinϕ)J_\varphi = \begin{pmatrix} \sin\phi\cos\theta & -\rho\sin\phi\sin\theta & \rho\cos\phi\cos\theta \\ \sin\phi\sin\theta & \rho\sin\phi\cos\theta & \rho\cos\phi\sin\theta \\ \cos\phi & 0 & -\rho\sin\phi \end{pmatrix}

Computing detJφ|\det J_\varphi| by cofactor expansion along the third row:

detJφ=cosϕ(ρ2sinϕcosϕsin2θρ2sinϕcosϕcos2θ)+(ρsinϕ)(ρsin2ϕcos2θ+ρsin2ϕsin2θ)\det J_\varphi = \cos\phi \cdot (-\rho^2\sin\phi\cos\phi\sin^2\theta - \rho^2\sin\phi\cos\phi\cos^2\theta) + (-\rho\sin\phi)(\rho\sin^2\phi\cos^2\theta + \rho\sin^2\phi\sin^2\theta)

=ρ2sinϕcos2ϕρ2sin3ϕ=ρ2sinϕ(cos2ϕ+sin2ϕ)=ρ2sinϕ.= -\rho^2\sin\phi\cos^2\phi - \rho^2\sin^3\phi = -\rho^2\sin\phi(\cos^2\phi + \sin^2\phi) = -\rho^2\sin\phi.

Since sinϕ0\sin\phi \ge 0 for ϕ[0,π]\phi \in [0, \pi]: detJφ=ρ2sinϕ|\det J_\varphi| = \rho^2\sin\phi. The volume element is dV=ρ2sinϕdρdθdϕdV = \rho^2\sin\phi\,d\rho\,d\theta\,d\phi.

📝 Example 9 (Volume of the Unit Ball)

V=x2+y2+z21dV=02π0π01ρ2sinϕdρdϕdθ=2π213=4π3.V = \iiint_{x^2+y^2+z^2 \le 1} dV = \int_0^{2\pi}\int_0^{\pi}\int_0^1 \rho^2 \sin\phi\,d\rho\,d\phi\,d\theta = 2\pi \cdot 2 \cdot \frac{1}{3} = \frac{4\pi}{3}.

Three decoupled 1D integrals — the spherical symmetry of the ball makes the computation trivial.

📝 Example 10 (Moment of Inertia of a Solid Sphere)

For a uniform solid sphere of mass MM and radius RR with density ρmass=3M4πR3\rho_{\text{mass}} = \frac{3M}{4\pi R^3}, the moment of inertia about the zz-axis is I=ρmass(x2+y2)dVI = \iiint \rho_{\text{mass}} (x^2 + y^2)\,dV. In spherical coordinates, x2+y2=ρ2sin2ϕx^2 + y^2 = \rho^2 \sin^2\phi:

I=3M4πR302π0π0Rρ2sin2ϕρ2sinϕdρdϕdθ.I = \frac{3M}{4\pi R^3} \int_0^{2\pi}\int_0^{\pi}\int_0^R \rho^2 \sin^2\phi \cdot \rho^2 \sin\phi\,d\rho\,d\phi\,d\theta.

Evaluating each factor: 02πdθ=2π\int_0^{2\pi}d\theta = 2\pi, 0πsin3ϕdϕ=43\int_0^{\pi}\sin^3\phi\,d\phi = \frac{4}{3}, 0Rρ4dρ=R55\int_0^R \rho^4\,d\rho = \frac{R^5}{5}. So I=3M4πR32π43R55=25MR2I = \frac{3M}{4\pi R^3} \cdot 2\pi \cdot \frac{4}{3} \cdot \frac{R^5}{5} = \frac{2}{5}MR^2.

💡 Remark 6 (Which Coordinate System to Use?)

  • Polar: circular symmetry in 2D (x2+y2x^2 + y^2 appears, region is a disk or annulus).
  • Cylindrical: cylindrical symmetry in 3D (the region or integrand is symmetric about the zz-axis).
  • Spherical: spherical symmetry (x2+y2+z2x^2 + y^2 + z^2 appears, region is a ball or spherical shell).
  • If none of these symmetries are present, a custom coordinate transformation may still simplify the integral.

(ρ, θ, φ) = (1.00, 1.05, 0.79) | |det J| = 0.7071

const-ρ const-θ const-φ volume element

Side-by-side: cylindrical coordinate surfaces and spherical coordinate surfaces

7. The General Change of Variables Theorem

Polar, cylindrical, and spherical are special cases. The general theorem handles any C1C^1 diffeomorphism.

📐 Definition 5 (C¹ Diffeomorphism)

A function φ:DD\varphi: D^* \to D between open subsets of Rn\mathbb{R}^n is a C1C^1 diffeomorphism if: (i) φ\varphi is bijective, (ii) φ\varphi is C1C^1 (continuously differentiable), and (iii) φ1:DD\varphi^{-1}: D \to D^* is C1C^1. By the Inverse Function Theorem (Topic 12, Theorem 1), conditions (ii) and (iii) are equivalent to: φ\varphi is C1C^1 and detJφ(u)0\det J_\varphi(u) \neq 0 for all uDu \in D^*.

🔷 Theorem 3 (Change of Variables (General))

Let φ:DD\varphi: D^* \to D be a C1C^1 diffeomorphism between open subsets of Rn\mathbb{R}^n. Let f:DRf: D \to \mathbb{R} be continuous and integrable over DD. Then:

Df(x)dx=Df(φ(u))detJφ(u)du.\int_D f(x)\,dx = \int_{D^*} f(\varphi(u))\,|\det J_\varphi(u)|\,du.

This reduces to the 1D substitution rule (Theorem 1) when n=1n = 1, and to the 2D formula (Theorem 2) when n=2n = 2.

Proof.

We follow Spivak’s approach (Calculus on Manifolds, Theorem 3-13).

Step 1: The linear case. If φ(u)=Au+b\varphi(u) = Au + b is an affine map with AA invertible, then D=A(D)+bD = A(D^*) + b, Jφ=AJ_\varphi = A, and the formula follows from the definition of the Riemann integral: the Riemann sum over DD with partition PP corresponds to the Riemann sum over DD^* with partition A1(Pb)A^{-1}(P - b), with each cell volume scaled by detA|\det A|.

Step 2: Local validity. For a general C1C^1 diffeomorphism and any u0Du_0 \in D^*, the linear approximation φ(u)φ(u0)+Jφ(u0)(uu0)\varphi(u) \approx \varphi(u_0) + J_\varphi(u_0)(u - u_0) is accurate on a small ball Bϵ(u0)B_\epsilon(u_0). Because JφJ_\varphi is continuous and detJφ0\det J_\varphi \neq 0, on a sufficiently small ball:

detJφ(u)detJφ(u0)1<δ\left|\frac{|\det J_\varphi(u)|}{|\det J_\varphi(u_0)|} - 1\right| < \delta

for any prescribed δ>0\delta > 0. The integral formula holds locally up to an error controlled by δ\delta.

Step 3: Partition of unity. Cover DD^* with a finite collection of balls {Bk}\{B_k\} (using compactness of D\overline{D^*} if DD^* is bounded; the general case uses exhaustion by compact subsets — see Topic 3 for compactness). Choose a subordinate partition of unity {ψk}\{\psi_k\}: smooth functions ψk0\psi_k \ge 0 with supp(ψk)Bk\text{supp}(\psi_k) \subset B_k and kψk=1\sum_k \psi_k = 1 on DD^*. Then:

Dfdx=kD(ψkφ1)fdx=kBk(ψkfφ)detJφdu=Df(φ(u))detJφ(u)du.\int_D f\,dx = \sum_k \int_D (\psi_k \circ \varphi^{-1}) \cdot f\,dx = \sum_k \int_{B_k} (\psi_k \cdot f \circ \varphi)\,|\det J_\varphi|\,du = \int_{D^*} f(\varphi(u))\,|\det J_\varphi(u)|\,du.

Each step uses: (i) the partition of unity decomposes ff into locally supported pieces, (ii) the linear case applies locally on each BkB_k (up to the error in Step 2), (iii) summing recovers the full integral. The rigorous details involve showing the error terms from Step 2 sum to zero — this uses the uniform continuity of JφJ_\varphi on compact subsets.

💡 Remark 7 (Relaxing the Hypotheses)

The theorem extends to φ\varphi that fails to be injective or has detJφ=0\det J_\varphi = 0 on a set of measure zero (e.g., the origin for polar coordinates, or the zz-axis for cylindrical coordinates). The Lebesgue version of the change-of-variables theorem handles this rigorously — see Sigma-Algebras & Measures and The Lebesgue Integral (coming soon).

Arbitrary diffeomorphism deforming a grid, with Jacobian determinant heatmap overlay

This section builds computational fluency through a series of worked examples with different transformations.

📝 Example 11 (Elliptical Coordinates — Area of an Ellipse)

Let φ(u,v)=(aucosv,businv)\varphi(u, v) = (au\cos v, bu\sin v) with a,b>0a, b > 0. The Jacobian determinant is detJφ=abu|\det J_\varphi| = abu. The area of the ellipse x2a2+y2b21\frac{x^2}{a^2} + \frac{y^2}{b^2} \le 1:

02π01abududv=2πab12=πab.\int_0^{2\pi}\int_0^1 ab\,u\,du\,dv = 2\pi \cdot ab \cdot \frac{1}{2} = \pi ab.

This generalizes the disk area πr2\pi r^2 to πab\pi ab.

📝 Example 12 (Parabolic Coordinates)

φ(u,v)=(u2v22,uv)\varphi(u, v) = \left(\frac{u^2 - v^2}{2},\, uv\right) has detJφ=u2+v2|\det J_\varphi| = u^2 + v^2. Useful for problems with parabolic symmetry (e.g., electrostatics near a parabolic reflector).

🔷 Proposition 2 (Composition of Transformations)

If φ1:D1D1\varphi_1: D_1^* \to D_1 and φ2:D1D\varphi_2: D_1 \to D are C1C^1 diffeomorphisms, then φ=φ2φ1:D1D\varphi = \varphi_2 \circ \varphi_1: D_1^* \to D is a C1C^1 diffeomorphism with:

detJφ(u)=detJφ2(φ1(u))detJφ1(u).\det J_\varphi(u) = \det J_{\varphi_2}(\varphi_1(u)) \cdot \det J_{\varphi_1}(u).

This follows from the chain rule for Jacobians (Topic 10, Theorem 2) and the multiplicativity of determinants. Geometrically: volume distortions compose multiplicatively, as established in Topic 10, Proposition 2.

📝 Example 13 (Composed Transformation — Rotated Ellipse)

To integrate over the region x2+xy+y21x^2 + xy + y^2 \le 1 (a rotated ellipse), first rotate by π/4\pi/4 to diagonalize the quadratic form, then scale to a unit disk. The composed Jacobian determinant is the product of the individual determinants.

Four-panel: elliptical, parabolic, composed, and shear transformations

9. The Density Transformation Formula

This section reframes the change of variables formula for probability densities and connects to normalizing flows — the most direct ML application of this calculus.

📐 Definition 6 (Density Transformation)

Let ZZ be a random vector with density pZp_Z and let X=φ(Z)X = \varphi(Z) where φ\varphi is a C1C^1 diffeomorphism. The density of XX is:

pX(x)=pZ(φ1(x))detJφ1(x)=pZ(φ1(x))detJφ(φ1(x)).p_X(x) = p_Z(\varphi^{-1}(x)) \cdot |\det J_{\varphi^{-1}}(x)| = \frac{p_Z(\varphi^{-1}(x))}{|\det J_\varphi(\varphi^{-1}(x))|}.

This follows directly from the change of variables theorem applied to P(XA)=ApX(x)dx=φ1(A)pZ(z)dz=ApZ(φ1(x))detJφ1(x)dxP(X \in A) = \int_A p_X(x)\,dx = \int_{\varphi^{-1}(A)} p_Z(z)\,dz = \int_A p_Z(\varphi^{-1}(x)) |\det J_{\varphi^{-1}}(x)|\,dx for all measurable AA.

📝 Example 14 (Log-Normal from Normal)

If ZN(0,1)Z \sim \mathcal{N}(0, 1) and X=eZX = e^Z, then φ(z)=ez\varphi(z) = e^z, φ1(x)=lnx\varphi^{-1}(x) = \ln x, (φ1)(x)=1/x|(\varphi^{-1})'(x)| = 1/x. So:

pX(x)=12πe(lnx)2/21x=1x2πe(lnx)2/2for x>0.p_X(x) = \frac{1}{\sqrt{2\pi}} e^{-(\ln x)^2/2} \cdot \frac{1}{x} = \frac{1}{x\sqrt{2\pi}} e^{-(\ln x)^2/2} \quad \text{for } x > 0.

This is the log-normal density.

💡 Remark 8 (Normalizing Flows)

A normalizing flow is a composition of KK diffeomorphisms: x=fKfK1f1(z)x = f_K \circ f_{K-1} \circ \cdots \circ f_1(z). By the composition rule (Proposition 2):

logpX(x)=logpZ(z)k=1KlogdetJfk(zk1)\log p_X(x) = \log p_Z(z) - \sum_{k=1}^K \log |\det J_{f_k}(z_{k-1})|

where z0=zz_0 = z and zk=fk(zk1)z_k = f_k(z_{k-1}). The entire architecture of flow-based generative models (RealNVP, Glow, Neural Spline Flows) is designed to make each detJfk|\det J_{f_k}| cheap to compute — typically O(n)O(n) instead of O(n3)O(n^3) — by using triangular Jacobians (coupling layers, autoregressive transforms). This is the change-of-variables formula doing heavy lifting in generative modeling.

📝 Example 15 (Affine Coupling Layer (RealNVP))

The RealNVP coupling layer splits z=(za,zb)z = (z_a, z_b) and defines xa=zax_a = z_a, xb=zbexp(s(za))+t(za)x_b = z_b \odot \exp(s(z_a)) + t(z_a). The Jacobian is lower-triangular with diagonal entries 11 (for xax_a) and exp(si(za))\exp(s_i(z_a)) (for xbx_b). So logdetJ=isi(za)\log|\det J| = \sum_i s_i(z_a) — a sum, not a determinant. This is O(n)O(n) and trivially differentiable. The IFT (Topic 12) guarantees invertibility since exp(si)>0\exp(s_i) > 0 everywhere.

💡 Remark 9 (The Reparameterization Trick)

In variational autoencoders, we need μ,σEqϕ(z)[f(z)]\nabla_{\mu, \sigma} \mathbb{E}_{q_\phi(z)}[f(z)] where qϕ(z)=N(μ,σ2)q_\phi(z) = \mathcal{N}(\mu, \sigma^2). The expectation depends on μ,σ\mu, \sigma through the distribution, so we can’t just differentiate under the integral sign. The trick: write z=μ+σϵz = \mu + \sigma\epsilon with ϵN(0,1)\epsilon \sim \mathcal{N}(0, 1). This is a change of variables from ϵ\epsilon to zz. Now Eqϕ(z)[f(z)]=EϵN(0,1)[f(μ+σϵ)]\mathbb{E}_{q_\phi(z)}[f(z)] = \mathbb{E}_{\epsilon \sim \mathcal{N}(0,1)}[f(\mu + \sigma\epsilon)], and the gradient moves inside: μ,σE[f(μ+σϵ)]\nabla_{\mu, \sigma} \mathbb{E}[f(\mu + \sigma\epsilon)] is just a standard derivative of a deterministic function of μ,σ\mu, \sigma evaluated at a random ϵ\epsilon. The change-of-variables formula separates the randomness (ϵ\epsilon) from the parameters (μ,σ\mu, \sigma).

∫p_X(x)dx ≈ 1.0000

Multi-layer normalizing flow diagram showing densities at each stage

10. Connections & Further Reading

This topic resolves forward references from:

Prerequisites used:

What comes next within formalCalculus:

  • Surface Integrals & the Divergence Theorem — Surface parameterization requires the Jacobian of the parameterization map; the surface area element involves the cross product of tangent vectors, which is the 2D analog of the Jacobian determinant.
  • The Lebesgue Integral (coming soon) — The Lebesgue change of variables theorem generalizes the Riemann version, relaxing the diffeomorphism requirement to almost-everywhere injectivity.
  • Sigma-Algebras & Measures (coming soon) — Pushforward measures and the change-of-variables formula for Lebesgue integrals depend on the Jacobian determinant.

Connections to formalML:

  • Gradient Descent → formalML — The reparameterization trick in variational inference is a change of variables applied to make gradients tractable.
  • Measure-Theoretic Probability → formalML — The density transformation formula is the probabilistic form of the change of variables theorem.
  • Smooth Manifolds → formalML — The partition-of-unity argument in the general proof connects to manifold integration.
  • Information Geometry → formalML — The Fisher information matrix transforms under reparameterization via the Jacobian.

References

  1. book Spivak (1965). Calculus on Manifolds Chapter 3, Theorem 3-13 — the change of variables theorem with full proof via partition of unity
  2. book Rudin (1976). Principles of Mathematical Analysis Theorem 10.9 — change of variables for Riemann integrals in Rⁿ
  3. book Munkres (1991). Analysis on Manifolds Chapter 3 — change of variables with extended Jacobian and boundary conditions
  4. book Hubbard & Hubbard (2015). Vector Calculus, Linear Algebra, and Differential Forms Chapter 4 — geometric treatment of coordinate transformations and the Jacobian
  5. paper Rezende & Mohamed (2015). “Variational Inference with Normalizing Flows” The normalizing flow framework: density transformation via the change of variables formula with tractable Jacobian determinants
  6. paper Kingma & Welling (2014). “Auto-Encoding Variational Bayes” The reparameterization trick: change of variables applied to move parameters outside the expectation for gradient estimation