Multivariable Integral · intermediate · 50 min read

Change of Variables & the Jacobian Determinant

Transforming integrals under coordinate substitution — the Jacobian determinant as the volume scaling factor, polar, cylindrical, and spherical coordinates, the Gaussian integral, and density transformations in normalizing flows

Abstract. The change of variables formula transforms a multiple integral from one coordinate system to another: if φ: D* → D is a C¹ diffeomorphism, then ∫∫_D f(x,y) dA = ∫∫_{D*} f(φ(u,v)) |det J_φ(u,v)| du dv. The Jacobian determinant |det J_φ| measures how φ distorts area elements — it is the volume scaling factor that was introduced abstractly in the Jacobian topic and now does real computational work. Polar coordinates are the canonical example: the map (r,θ) → (r cos θ, r sin θ) has Jacobian determinant r, producing the area element r dr dθ. This immediately resolves the painful disk integral from Topic 13 and enables the classic proof that the Gaussian integral equals √π — a result that propagates throughout probability and statistical mechanics. Cylindrical and spherical coordinates extend the framework to three dimensions. The general theorem, proved via the Inverse Function Theorem and a partition of unity argument, guarantees that integration is coordinate-independent for any C¹ diffeomorphism. In machine learning, the change of variables formula is the mathematical engine of normalizing flows: the density of a transformed variable X = f(Z) satisfies p_X(x) = p_Z(f⁻¹(x)) · |det J_{f⁻¹}(x)|, and the entire architecture of flow-based generative models is designed to make this Jacobian determinant tractable. The reparameterization trick in variational autoencoders is the same formula applied to move gradient computation outside the expectation.

1. Overview & Motivation

You’ve built a generative model — a neural network that maps simple noise $z \sim \mathcal{N}(0, I)$ to complex data $x = f(z)$ . To train it, you need the density $p_X(x)$ . The change of variables formula gives it:

$p_X(x) = p_Z(f^{-1}(x)) \cdot |\det J_{f^{-1}}(x)|.$

The entire architecture of normalizing flows is designed to make the right side of this equation tractable. And the formula itself — “adjust the density by the Jacobian determinant” — is exactly the integration change of variables from multivariable calculus, applied to probability.

This topic brings together the Jacobian determinant (Topic 10), the Inverse Function Theorem (Topic 12), and multiple integrals (Topic 13) into a single powerful formula. The reader already has all the ingredients; this topic assembles them.

2. Substitution in One Variable — The Template

We know $u$ -substitution from Topic 7, but we reframe it through the lens of coordinate transformation to set up the multivariable version.

🔷 Theorem 1 (Substitution Rule (1D))

Let $g: [\alpha, \beta] \to [a, b]$ be a $C^1$ bijection with $g'(t) \neq 0$ on $(\alpha, \beta)$ . Then for any continuous $f: [a, b] \to \mathbb{R}$ :

$\int_a^b f(x)\,dx = \int_\alpha^\beta f(g(t))\,|g'(t)|\,dt.$

The factor $|g'(t)|$ adjusts for how $g$ stretches or compresses the interval. When $g$ is increasing, $|g'(t)| = g'(t)$ ; when decreasing, $|g'(t)| = -g'(t)$ , which also reverses the limits to compensate.

💡 Remark 1 (Why the Absolute Value?)

In 1D, you can track orientation via limit ordering. In $\mathbb{R}^n$ , there are no “limits” to reverse — the absolute value of the Jacobian determinant is the only way to ensure positivity of the volume element. The absolute value is the conceptual bridge from 1D to $n$ D.

📝 Example 1 (Quarter-Circle Area via Substitution)

Compute $\int_0^1 \sqrt{1 - x^2}\,dx$ via $x = \sin\theta$ . We have $g(\theta) = \sin\theta$ , $g'(\theta) = \cos\theta$ , limits $[0, \pi/2]$ :

$\int_0^1 \sqrt{1 - x^2}\,dx = \int_0^{\pi/2} \sqrt{1 - \sin^2\theta} \cdot \cos\theta\,d\theta = \int_0^{\pi/2} \cos^2\theta\,d\theta = \frac{\pi}{4}.$

This is one quarter of the unit disk area — a preview of polar coordinates.

📝 Example 2 (Gaussian Integral via Gamma Function)

Compute $\int_0^\infty e^{-x^2/2}\,dx$ via $u = x^2/2$ . We get $x = \sqrt{2u}$ , $dx = du/\sqrt{2u}$ :

$\int_0^\infty e^{-x^2/2}\,dx = \frac{1}{\sqrt{2}} \int_0^\infty u^{-1/2} e^{-u}\,du = \frac{1}{\sqrt{2}} \,\Gamma(1/2) = \sqrt{\pi/2}.$

This connects to the Gamma function (Topic 8) and previews the Gaussian integral computation later in this topic.

1D substitution: original integral with x-axis, transformed integral with u-axis

3. The Change of Variables Formula in 2D

This is where the Jacobian determinant enters integration. Draw a grid in the $(u, v)$ -plane. The map $\varphi$ sends each small rectangle $du \times dv$ to a small parallelogram in the $(x, y)$ -plane. The area of that parallelogram is approximately $|\det J_\varphi(u, v)| \cdot du\,dv$ — this is exactly the Volume Distortion Theorem from Topic 10. The change of variables formula says: sum up $f$ over the deformed parallelograms, weighting each by its area.

📐 Definition 1 (Coordinate Transformation)

A coordinate transformation (or change of variables) on an open set $D^* \subseteq \mathbb{R}^2$ is a $C^1$ function $\varphi: D^* \to D \subseteq \mathbb{R}^2$ that is a diffeomorphism: bijective, $C^1$ , and with $C^1$ inverse $\varphi^{-1}: D \to D^*$ .

💡 Remark 2 (Diffeomorphism vs. Local Diffeomorphism)

The IFT (Topic 12) guarantees a local diffeomorphism wherever $\det J_\varphi \neq 0$ . For the change of variables formula, we need a global diffeomorphism on $D^*$ (or at least injectivity, with $\det J_\varphi = 0$ only on a set of measure zero). Polar coordinates fail to be a global diffeomorphism on $\{r > 0\}$ because of the $2\pi$ -periodicity in $\theta$ , but they are a diffeomorphism on any domain that doesn’t wrap all the way around.

🔷 Theorem 2 (Change of Variables (2D))

Let $\varphi: D^* \to D$ be a $C^1$ diffeomorphism between open subsets of $\mathbb{R}^2$ , and let $f: D \to \mathbb{R}$ be continuous. Then:

$\iint_D f(x, y)\,dA = \iint_{D^*} f(\varphi(u, v))\,|\det J_\varphi(u, v)|\,du\,dv.$

The Jacobian determinant $|\det J_\varphi|$ converts the area element $du\,dv$ in the $(u,v)$ -coordinate system to the area element $dA$ in the $(x,y)$ -coordinate system.

📝 Example 3 (Linear Change of Variables)

For $\varphi(u, v) = (au + bv,\, cu + dv)$ with $A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}$ and $\det A \neq 0$ : $\det J_\varphi = \det A$ (constant). The formula reduces to $\iint_D f(x,y)\,dA = |\det A| \iint_{D^*} f(Au)\,du\,dv$ . Linear maps scale all areas by the same factor.

📝 Example 4 (Affine Shear)

For $\varphi(u,v) = (u + v, v)$ : $\det J_\varphi = 1$ . The integral is unchanged: $\iint_D f\,dA = \iint_{D^*} f(u+v, v)\,du\,dv$ . Area-preserving transformations have a unit Jacobian determinant.

💡 Remark 3 (The Formula in Reverse)

Often we start with an integral in $(x, y)$ and want to transform to $(u, v)$ . If $\varphi: D^* \to D$ is the transformation $(u, v) \mapsto (x, y)$ , then we substitute $x = \varphi_1(u,v)$ , $y = \varphi_2(u,v)$ , $dA = |\det J_\varphi|\,du\,dv$ , and change the region from $D$ to $D^* = \varphi^{-1}(D)$ . The conceptual direction is: “I want simpler limits, so I find a $\varphi$ that maps a simple $D^*$ to the complicated $D$ .”

Transform: Grid: 12×12 |det J| heatmap Jacobian columns

Polar coordinates: (r,θ) → (r cos θ, r sin θ). The Jacobian determinant is r — cells near the origin are compressed.

|det J| range: [0.179, 1.921]

Uniform grid in (u,v) space deformed by φ into (x,y) space, cells colored by |det J|

4. Polar Coordinates

The reader has been waiting for this since Example 7 of Topic 13 (the disk integral that was “painful in Cartesian”). Polar coordinates are the prototypical 2D change of variables.

📐 Definition 2 (Polar Coordinates)

The polar coordinate transformation is $\varphi(r, \theta) = (r\cos\theta, r\sin\theta)$ , mapping from $D^* = \{(r, \theta) : r > 0,\, \theta \in (0, 2\pi)\}$ to $D = \mathbb{R}^2 \setminus \{x \ge 0, y = 0\}$ (the plane minus the positive $x$ -axis). The Jacobian is:

$J_\varphi(r, \theta) = \begin{pmatrix} \cos\theta & -r\sin\theta \\ \sin\theta & r\cos\theta \end{pmatrix}, \qquad |\det J_\varphi| = r.$

The area element is $dA = r\,dr\,d\theta$ .

💡 Remark 4 (Why r > 0?)

At $r = 0$ , $\det J_\varphi = 0$ , and $\varphi$ collapses all $\theta$ values to the origin — it is not injective. The origin is a single point (measure zero), so excluding it does not affect the integral. This is typical: changes of variables are allowed to fail on sets of measure zero.

📝 Example 5 (Area of the Unit Disk (Resolved))

$\iint_{x^2+y^2 \le 1} dA = \int_0^{2\pi} \int_0^1 r\,dr\,d\theta = 2\pi \cdot \frac{1}{2} = \pi.$

Compare with the Cartesian computation from Topic 13, Example 7: $\int_{-1}^1 \int_{-\sqrt{1-x^2}}^{\sqrt{1-x^2}} dy\,dx = \int_{-1}^1 2\sqrt{1-x^2}\,dx = \pi$ . Same answer, but polar reduces the integral to a product of two elementary 1D integrals.

📝 Example 6 (Integral over an Annulus)

$\iint_D (x^2 + y^2)\,dA$ over the annulus $1 \le x^2+y^2 \le 4$ : in polar, $x^2 + y^2 = r^2$ :

$\int_0^{2\pi} \int_1^2 r^2 \cdot r\,dr\,d\theta = 2\pi \cdot \frac{r^4}{4}\Big|_1^2 = 2\pi \cdot \frac{15}{4} = \frac{15\pi}{2}.$

The circular symmetry of both the region and the integrand makes polar coordinates the natural choice.

📝 Example 7 (Volume Under the Paraboloid (Resolved))

From Topic 13, §7: the volume between $z = x^2 + y^2$ and $z = 4$ over the disk $x^2 + y^2 \le 4$ . In polar:

$V = \int_0^{2\pi} \int_0^2 (4 - r^2)\,r\,dr\,d\theta = 2\pi \int_0^2 (4r - r^3)\,dr = 2\pi \left[2r^2 - \frac{r^4}{4}\right]_0^2 = 2\pi(8 - 4) = 8\pi.$

Function: Partitions: 10×10View:

Polar Riemann sum: 3.141593 | Exact: 3.141593 | Error: 4.44e-16

Cartesian grid vs. polar grid on the unit disk, area element comparison

5. The Gaussian Integral

The computation of $\int_{-\infty}^{\infty} e^{-x^2}\,dx = \sqrt{\pi}$ is one of the most important results in all of mathematics, appearing in probability (the normalizing constant of the normal distribution), statistical mechanics (the partition function), and quantum mechanics (path integrals). The proof is a triumph of the change-of-variables formula.

🔷 Proposition 1 (The Gaussian Integral)

$\int_{-\infty}^{\infty} e^{-x^2}\,dx = \sqrt{\pi}.$

Proof.

Step 1: Square the integral. Let $I = \int_{-\infty}^{\infty} e^{-x^2}\,dx$ . Then:

$I^2 = \left(\int_{-\infty}^{\infty} e^{-x^2}\,dx\right)\left(\int_{-\infty}^{\infty} e^{-y^2}\,dy\right) = \iint_{\mathbb{R}^2} e^{-(x^2+y^2)}\,dA.$

The first equality uses Fubini (Topic 13, Theorem 1) to convert the product of two 1D integrals into a double integral. This requires justifying that $e^{-(x^2+y^2)}$ is integrable over $\mathbb{R}^2$ — it suffices to note that $\int_0^R \int_0^{2\pi} e^{-r^2} r\,d\theta\,dr = 2\pi \int_0^R r e^{-r^2}\,dr = \pi(1 - e^{-R^2}) \to \pi$ as $R \to \infty$ , so the improper integral converges.

Step 2: Switch to polar coordinates. Using $x^2 + y^2 = r^2$ and $dA = r\,dr\,d\theta$ :

$I^2 = \int_0^{2\pi}\int_0^{\infty} e^{-r^2}\,r\,dr\,d\theta.$

The inner integral is elementary: $\int_0^{\infty} r e^{-r^2}\,dr = \left[-\frac{1}{2}e^{-r^2}\right]_0^\infty = \frac{1}{2}$ .

Step 3: Evaluate. $I^2 = 2\pi \cdot \frac{1}{2} = \pi$ . Since $I > 0$ , we conclude $I = \sqrt{\pi}$ .

∎

💡 Remark 5 (Why Is This Hard in Cartesian?)

The function $e^{-x^2}$ has no elementary antiderivative (this is a theorem, not a failure of technique). The 1D integral is genuinely intractable without the 2D “trick.” The change-of-variables formula transforms a hard 1D problem into an easy 2D one by exploiting radial symmetry.

📝 Example 8 (The Gaussian Normalizing Constant)

The PDF of $\mathcal{N}(\mu, \sigma^2)$ is $\frac{1}{\sigma\sqrt{2\pi}} e^{-(x-\mu)^2/(2\sigma^2)}$ . Verify normalization using the substitution $u = (x - \mu)/(\sigma\sqrt{2})$ and $\int_{-\infty}^{\infty} e^{-u^2}\,du = \sqrt{\pi}$ :

$\int_{-\infty}^{\infty} \frac{1}{\sigma\sqrt{2\pi}} e^{-(x-\mu)^2/(2\sigma^2)}\,dx = \frac{1}{\sigma\sqrt{2\pi}} \cdot \sigma\sqrt{2} \cdot \sqrt{\pi} = 1.$

This is why the $\sqrt{2\pi}$ appears in the Gaussian density — it is the Gaussian integral.

I = ∫e^{-x²}dx from -∞ to ∞. We cannot find a closed-form antiderivative.

Truncation R = 5.0

I_R = 1.772454I² ≈ 3.141593√π = 1.772454 | Error: 2.74e-12

3D Gaussian surface with polar radial rings and convergence plot

6. Cylindrical and Spherical Coordinates

The change of variables formula works in any dimension. Here we cover the two standard 3D coordinate systems.

📐 Definition 3 (Cylindrical Coordinates)

The cylindrical coordinate transformation is $\varphi(r, \theta, z) = (r\cos\theta, r\sin\theta, z)$ , with $r > 0$ , $\theta \in (0, 2\pi)$ , $z \in \mathbb{R}$ . The Jacobian is:

$J_\varphi = \begin{pmatrix} \cos\theta & -r\sin\theta & 0 \\ \sin\theta & r\cos\theta & 0 \\ 0 & 0 & 1 \end{pmatrix}, \qquad |\det J_\varphi| = r.$

The volume element is $dV = r\,dr\,d\theta\,dz$ .

📐 Definition 4 (Spherical Coordinates)

The spherical coordinate transformation is $\varphi(\rho, \theta, \phi) = (\rho\sin\phi\cos\theta,\, \rho\sin\phi\sin\theta,\, \rho\cos\phi)$ , with $\rho > 0$ , $\theta \in (0, 2\pi)$ , $\phi \in (0, \pi)$ . The Jacobian matrix is:

$J_\varphi = \begin{pmatrix} \sin\phi\cos\theta & -\rho\sin\phi\sin\theta & \rho\cos\phi\cos\theta \\ \sin\phi\sin\theta & \rho\sin\phi\cos\theta & \rho\cos\phi\sin\theta \\ \cos\phi & 0 & -\rho\sin\phi \end{pmatrix}$

Computing $|\det J_\varphi|$ by cofactor expansion along the third row:

$\det J_\varphi = \cos\phi \cdot (-\rho^2\sin\phi\cos\phi\sin^2\theta - \rho^2\sin\phi\cos\phi\cos^2\theta) + (-\rho\sin\phi)(\rho\sin^2\phi\cos^2\theta + \rho\sin^2\phi\sin^2\theta)$

$= -\rho^2\sin\phi\cos^2\phi - \rho^2\sin^3\phi = -\rho^2\sin\phi(\cos^2\phi + \sin^2\phi) = -\rho^2\sin\phi.$

Since $\sin\phi \ge 0$ for $\phi \in [0, \pi]$ : $|\det J_\varphi| = \rho^2\sin\phi$ . The volume element is $dV = \rho^2\sin\phi\,d\rho\,d\theta\,d\phi$ .

📝 Example 9 (Volume of the Unit Ball)

$V = \iiint_{x^2+y^2+z^2 \le 1} dV = \int_0^{2\pi}\int_0^{\pi}\int_0^1 \rho^2 \sin\phi\,d\rho\,d\phi\,d\theta = 2\pi \cdot 2 \cdot \frac{1}{3} = \frac{4\pi}{3}.$

Three decoupled 1D integrals — the spherical symmetry of the ball makes the computation trivial.

📝 Example 10 (Moment of Inertia of a Solid Sphere)

For a uniform solid sphere of mass $M$ and radius $R$ with density $\rho_{\text{mass}} = \frac{3M}{4\pi R^3}$ , the moment of inertia about the $z$ -axis is $I = \iiint \rho_{\text{mass}} (x^2 + y^2)\,dV$ . In spherical coordinates, $x^2 + y^2 = \rho^2 \sin^2\phi$ :

$I = \frac{3M}{4\pi R^3} \int_0^{2\pi}\int_0^{\pi}\int_0^R \rho^2 \sin^2\phi \cdot \rho^2 \sin\phi\,d\rho\,d\phi\,d\theta.$

Evaluating each factor: $\int_0^{2\pi}d\theta = 2\pi$ , $\int_0^{\pi}\sin^3\phi\,d\phi = \frac{4}{3}$ , $\int_0^R \rho^4\,d\rho = \frac{R^5}{5}$ . So $I = \frac{3M}{4\pi R^3} \cdot 2\pi \cdot \frac{4}{3} \cdot \frac{R^5}{5} = \frac{2}{5}MR^2$ .

💡 Remark 6 (Which Coordinate System to Use?)

Polar: circular symmetry in 2D ( $x^2 + y^2$ appears, region is a disk or annulus).
Cylindrical: cylindrical symmetry in 3D (the region or integrand is symmetric about the $z$ -axis).
Spherical: spherical symmetry ( $x^2 + y^2 + z^2$ appears, region is a ball or spherical shell).
If none of these symmetries are present, a custom coordinate transformation may still simplify the integral.

System: ρ: 1.00θ: 1.05φ: 0.79

Azimuth: Elevation: Coordinate surfaces Volume element

(ρ, θ, φ) = (1.00, 1.05, 0.79) | |det J| = 0.7071

■ const-ρ ■ const-θ ■ const-φ ■ volume element

Side-by-side: cylindrical coordinate surfaces and spherical coordinate surfaces

7. The General Change of Variables Theorem

Polar, cylindrical, and spherical are special cases. The general theorem handles any $C^1$ diffeomorphism.

📐 Definition 5 (C¹ Diffeomorphism)

A function $\varphi: D^* \to D$ between open subsets of $\mathbb{R}^n$ is a $C^1$ diffeomorphism if: (i) $\varphi$ is bijective, (ii) $\varphi$ is $C^1$ (continuously differentiable), and (iii) $\varphi^{-1}: D \to D^*$ is $C^1$ . By the Inverse Function Theorem (Topic 12, Theorem 1), conditions (ii) and (iii) are equivalent to: $\varphi$ is $C^1$ and $\det J_\varphi(u) \neq 0$ for all $u \in D^*$ .

🔷 Theorem 3 (Change of Variables (General))

Let $\varphi: D^* \to D$ be a $C^1$ diffeomorphism between open subsets of $\mathbb{R}^n$ . Let $f: D \to \mathbb{R}$ be continuous and integrable over $D$ . Then:

$\int_D f(x)\,dx = \int_{D^*} f(\varphi(u))\,|\det J_\varphi(u)|\,du.$

This reduces to the 1D substitution rule (Theorem 1) when $n = 1$ , and to the 2D formula (Theorem 2) when $n = 2$ .

Proof.

We follow Spivak’s approach (Calculus on Manifolds, Theorem 3-13).

Step 1: The linear case. If $\varphi(u) = Au + b$ is an affine map with $A$ invertible, then $D = A(D^*) + b$ , $J_\varphi = A$ , and the formula follows from the definition of the Riemann integral: the Riemann sum over $D$ with partition $P$ corresponds to the Riemann sum over $D^*$ with partition $A^{-1}(P - b)$ , with each cell volume scaled by $|\det A|$ .

Step 2: Local validity. For a general $C^1$ diffeomorphism and any $u_0 \in D^*$ , the linear approximation $\varphi(u) \approx \varphi(u_0) + J_\varphi(u_0)(u - u_0)$ is accurate on a small ball $B_\epsilon(u_0)$ . Because $J_\varphi$ is continuous and $\det J_\varphi \neq 0$ , on a sufficiently small ball:

$\left|\frac{|\det J_\varphi(u)|}{|\det J_\varphi(u_0)|} - 1\right| < \delta$

for any prescribed $\delta > 0$ . The integral formula holds locally up to an error controlled by $\delta$ .

Step 3: Partition of unity. Cover $D^*$ with a finite collection of balls $\{B_k\}$ (using compactness of $\overline{D^*}$ if $D^*$ is bounded; the general case uses exhaustion by compact subsets — see Topic 3 for compactness). Choose a subordinate partition of unity $\{\psi_k\}$ : smooth functions $\psi_k \ge 0$ with $\text{supp}(\psi_k) \subset B_k$ and $\sum_k \psi_k = 1$ on $D^*$ . Then:

$\int_D f\,dx = \sum_k \int_D (\psi_k \circ \varphi^{-1}) \cdot f\,dx = \sum_k \int_{B_k} (\psi_k \cdot f \circ \varphi)\,|\det J_\varphi|\,du = \int_{D^*} f(\varphi(u))\,|\det J_\varphi(u)|\,du.$

Each step uses: (i) the partition of unity decomposes $f$ into locally supported pieces, (ii) the linear case applies locally on each $B_k$ (up to the error in Step 2), (iii) summing recovers the full integral. The rigorous details involve showing the error terms from Step 2 sum to zero — this uses the uniform continuity of $J_\varphi$ on compact subsets.

∎

💡 Remark 7 (Relaxing the Hypotheses)

The theorem extends to $\varphi$ that fails to be injective or has $\det J_\varphi = 0$ on a set of measure zero (e.g., the origin for polar coordinates, or the $z$ -axis for cylindrical coordinates). The Lebesgue version of the change-of-variables theorem handles this rigorously — see Sigma-Algebras & Measures and The Lebesgue Integral (coming soon).

Arbitrary diffeomorphism deforming a grid, with Jacobian determinant heatmap overlay

8. A Gallery of Coordinate Transformations

This section builds computational fluency through a series of worked examples with different transformations.

📝 Example 11 (Elliptical Coordinates — Area of an Ellipse)

Let $\varphi(u, v) = (au\cos v, bu\sin v)$ with $a, b > 0$ . The Jacobian determinant is $|\det J_\varphi| = abu$ . The area of the ellipse $\frac{x^2}{a^2} + \frac{y^2}{b^2} \le 1$ :

$\int_0^{2\pi}\int_0^1 ab\,u\,du\,dv = 2\pi \cdot ab \cdot \frac{1}{2} = \pi ab.$

This generalizes the disk area $\pi r^2$ to $\pi ab$ .

📝 Example 12 (Parabolic Coordinates)

$\varphi(u, v) = \left(\frac{u^2 - v^2}{2},\, uv\right)$ has $|\det J_\varphi| = u^2 + v^2$ . Useful for problems with parabolic symmetry (e.g., electrostatics near a parabolic reflector).

🔷 Proposition 2 (Composition of Transformations)

If $\varphi_1: D_1^* \to D_1$ and $\varphi_2: D_1 \to D$ are $C^1$ diffeomorphisms, then $\varphi = \varphi_2 \circ \varphi_1: D_1^* \to D$ is a $C^1$ diffeomorphism with:

$\det J_\varphi(u) = \det J_{\varphi_2}(\varphi_1(u)) \cdot \det J_{\varphi_1}(u).$

This follows from the chain rule for Jacobians (Topic 10, Theorem 2) and the multiplicativity of determinants. Geometrically: volume distortions compose multiplicatively, as established in Topic 10, Proposition 2.

📝 Example 13 (Composed Transformation — Rotated Ellipse)

To integrate over the region $x^2 + xy + y^2 \le 1$ (a rotated ellipse), first rotate by $\pi/4$ to diagonalize the quadratic form, then scale to a unit disk. The composed Jacobian determinant is the product of the individual determinants.

Four-panel: elliptical, parabolic, composed, and shear transformations

9. The Density Transformation Formula

This section reframes the change of variables formula for probability densities and connects to normalizing flows — the most direct ML application of this calculus.

📐 Definition 6 (Density Transformation)

Let $Z$ be a random vector with density $p_Z$ and let $X = \varphi(Z)$ where $\varphi$ is a $C^1$ diffeomorphism. The density of $X$ is:

$p_X(x) = p_Z(\varphi^{-1}(x)) \cdot |\det J_{\varphi^{-1}}(x)| = \frac{p_Z(\varphi^{-1}(x))}{|\det J_\varphi(\varphi^{-1}(x))|}.$

This follows directly from the change of variables theorem applied to $P(X \in A) = \int_A p_X(x)\,dx = \int_{\varphi^{-1}(A)} p_Z(z)\,dz = \int_A p_Z(\varphi^{-1}(x)) |\det J_{\varphi^{-1}}(x)|\,dx$ for all measurable $A$ .

📝 Example 14 (Log-Normal from Normal)

If $Z \sim \mathcal{N}(0, 1)$ and $X = e^Z$ , then $\varphi(z) = e^z$ , $\varphi^{-1}(x) = \ln x$ , $|(\varphi^{-1})'(x)| = 1/x$ . So:

$p_X(x) = \frac{1}{\sqrt{2\pi}} e^{-(\ln x)^2/2} \cdot \frac{1}{x} = \frac{1}{x\sqrt{2\pi}} e^{-(\ln x)^2/2} \quad \text{for } x > 0.$

This is the log-normal density.

💡 Remark 8 (Normalizing Flows)

A normalizing flow is a composition of $K$ diffeomorphisms: $x = f_K \circ f_{K-1} \circ \cdots \circ f_1(z)$ . By the composition rule (Proposition 2):

$\log p_X(x) = \log p_Z(z) - \sum_{k=1}^K \log |\det J_{f_k}(z_{k-1})|$

where $z_0 = z$ and $z_k = f_k(z_{k-1})$ . The entire architecture of flow-based generative models (RealNVP, Glow, Neural Spline Flows) is designed to make each $|\det J_{f_k}|$ cheap to compute — typically $O(n)$ instead of $O(n^3)$ — by using triangular Jacobians (coupling layers, autoregressive transforms). This is the change-of-variables formula doing heavy lifting in generative modeling.

📝 Example 15 (Affine Coupling Layer (RealNVP))

The RealNVP coupling layer splits $z = (z_a, z_b)$ and defines $x_a = z_a$ , $x_b = z_b \odot \exp(s(z_a)) + t(z_a)$ . The Jacobian is lower-triangular with diagonal entries $1$ (for $x_a$ ) and $\exp(s_i(z_a))$ (for $x_b$ ). So $\log|\det J| = \sum_i s_i(z_a)$ — a sum, not a determinant. This is $O(n)$ and trivially differentiable. The IFT (Topic 12) guarantees invertibility since $\exp(s_i) > 0$ everywhere.

💡 Remark 9 (The Reparameterization Trick)

In variational autoencoders, we need $\nabla_{\mu, \sigma} \mathbb{E}_{q_\phi(z)}[f(z)]$ where $q_\phi(z) = \mathcal{N}(\mu, \sigma^2)$ . The expectation depends on $\mu, \sigma$ through the distribution, so we can’t just differentiate under the integral sign. The trick: write $z = \mu + \sigma\epsilon$ with $\epsilon \sim \mathcal{N}(0, 1)$ . This is a change of variables from $\epsilon$ to $z$ . Now $\mathbb{E}_{q_\phi(z)}[f(z)] = \mathbb{E}_{\epsilon \sim \mathcal{N}(0,1)}[f(\mu + \sigma\epsilon)]$ , and the gradient moves inside: $\nabla_{\mu, \sigma} \mathbb{E}[f(\mu + \sigma\epsilon)]$ is just a standard derivative of a deterministic function of $\mu, \sigma$ evaluated at a random $\epsilon$ . The change-of-variables formula separates the randomness ( $\epsilon$ ) from the parameters ( $\mu, \sigma$ ).

Flow: Scale (σ): 1.50Shift (μ): 1.00 Overlay source log|det J|

∫p_X(x)dx ≈ 1.0000

Multi-layer normalizing flow diagram showing densities at each stage

Connections & Further Reading

Prerequisites — topics you need first

advanced Multivariable Differential 45 min

Inverse & Implicit Function Theorems

The change of variables formula requires φ to be a C¹ diffeomorphism — locally invertible with C¹ inverse. The IFT guarantees this when det J_φ ≠ 0.

intermediate Multivariable Integral 50 min

Multiple Integrals & Fubini's Theorem

Every integral in this topic is a multiple integral evaluated via Fubini. The disk integral (Example 7 from Topic 13) is resolved here in polar coordinates.

intermediate Multivariable Differential 50 min

The Jacobian & Multivariate Chain Rule

The Jacobian determinant |det J_φ| as a volume scaling factor was introduced in Topic 10. This topic gives it its computational starring role.

foundational Single-Variable Calculus 50 min

The Riemann Integral & FTC

The 1D substitution rule is the single-variable ancestor. This topic generalizes it from |g'(x)| to |det J_φ|.

intermediate Single-Variable Calculus 50 min

Improper Integrals & Special Functions

The Gaussian integral ∫e^{-x²}dx = √π, previewed in Topic 8, is proved here via polar coordinates.

intermediate Limits & Continuity 40 min

Completeness & Compactness

Compactness of the integration domain ensures the partition of unity is finite in the general proof.

Where this leads — next in formalCalculus

advanced Multivariable Integral 55 min

Surface Integrals & the Divergence Theorem

Surface parameterization requires the Jacobian of the parameterization map; the surface area element involves the cross product of tangent vectors, which is the 2D analog of the Jacobian determinant.

advanced Measure & Integration 50 min

The Lebesgue Integral

The Lebesgue change of variables theorem generalizes the Riemann version here, relaxing the diffeomorphism requirement to almost-everywhere injectivity.

advanced Measure & Integration 45 min

Sigma-Algebras & Measures

Pushforward measures and the change-of-variables formula for Lebesgue integrals depend on the Jacobian determinant — the measure-theoretic framework formalizes what this topic does concretely.

On to formalStatistics — where this calculus powers inference

Random Variables

The pushforward density formula f_Y(y) = f_X(g⁻¹(y)) · |det J_{g⁻¹}(y)| is a direct application of the multivariate change-of-variables theorem. This is how every transformation of a continuous random variable is analyzed.

Multivariate Distributions

Polar-to-Cartesian, spherical coordinates, and linear transformations of the multivariate Normal all invoke the change-of-variables theorem. Copula constructions invert the marginal CDFs — another change-of-variables operation.

Bayesian Foundations And Prior Selection

The Jeffreys prior π(θ) ∝ √det I(θ) is constructed to be invariant under reparameterization φ = g(θ) — a property that follows from the Jacobian cancellation in the change-of-variables formula. Invariance is its defining feature.

Continuous Distributions

Lognormal from Normal, Chi-squared from Normal, t from Normal+Chi-squared, F from two Chi-squared — every derived distribution is obtained via change-of-variables on a known density.

On to formalML — where this calculus powers ML

Gradient Descent

The reparameterization trick in variational inference — writing z = μ + σε with ε ~ N(0,1) — is a change of variables that moves parameters outside the expectation, enabling gradient computation through sampling.

Measure Theoretic Probability

The density transformation formula p_X(x) = p_Z(φ⁻¹(x)) · |det J_{φ⁻¹}(x)| is the probabilistic form of the change of variables theorem.

Smooth Manifolds

Integration on manifolds requires coordinate charts, and the change of variables formula ensures that integrals are well-defined independent of chart choice.

Information Geometry

The Fisher information matrix transforms under reparameterization via Ĩ(θ̃) = JᵀI(θ)J, where J is the Jacobian of the parameter change.

Normalizing Flows

Every flow's log-density formula is downstream of the substitution rule $p_X(x) = p_Z(T^{-1}(x)) \cdot |\det \partial T^{-1}/\partial x|$. §2.2's $d$-dimensional version takes the substitution rule as a load-bearing tool, and the architecture's design pressure for triangular Jacobians is exactly to make this formula computationally tractable.

Probabilistic Programming

§3.1's change-of-variables theorem for densities (Theorem 2) is a direct application of the multivariable change-of-variables formula for integrals. The Jacobian-determinant calculation and the substitution argument used in the proof are exactly the pure-calculus moves taught here.

Reversible Jump MCMC

The change-of-variables formula for measurable bijections is the measure-theoretic substrate of §5's Theorem 1 — without it, the push-forward density $q_m$ divided by $|\det J_T|$ isn't well-defined. The inverse-function-theorem corollary $|\det J_{T^{-1}}| = 1/|\det J_T|$ is the algebraic core of §6's reversibility proof.

Sequential Monte Carlo

SMC samplers use a synthetic annealing path $\pi_t \propto \pi_0^{1-\beta}\pi_T^{\beta}$; reparameterization between unconstrained and constrained parameter spaces invokes the change-of-variables theorem and its Jacobian determinant, particularly when propagation kernels operate in reparameterized coordinates.

Sparse Bayesian Priors

§4's horseshoe-shape theorem ($\kappa_j \sim \text{Beta}(\tfrac12, \tfrac12)$ under unit $\tau$) is proved via change-of-variables from $\lambda_j^2$ to $\kappa_j = 1/(1 + \lambda_j^2)$; the Jacobian $d\kappa/d\lambda^2 = 1/\kappa^2$ is computed using the change-of-variables formula. §6's non-centered parameterization $\beta_j = z_j \cdot \lambda_j \cdot \tau$ is also a deterministic change of variables on the hierarchical model.

References

book Spivak (1965). Calculus on Manifolds Chapter 3, Theorem 3-13 — the change of variables theorem with full proof via partition of unity
book Rudin (1976). Principles of Mathematical Analysis Theorem 10.9 — change of variables for Riemann integrals in Rⁿ
book Munkres (1991). Analysis on Manifolds Chapter 3 — change of variables with extended Jacobian and boundary conditions
book Hubbard & Hubbard (2015). Vector Calculus, Linear Algebra, and Differential Forms Chapter 4 — geometric treatment of coordinate transformations and the Jacobian
paper Rezende & Mohamed (2015). “Variational Inference with Normalizing Flows” The normalizing flow framework: density transformation via the change of variables formula with tractable Jacobian determinants
paper Kingma & Welling (2014). “Auto-Encoding Variational Bayes” The reparameterization trick: change of variables applied to move parameters outside the expectation for gradient estimation