Single-Variable Calculus · intermediate · 50 min read

Improper Integrals & Special Functions

Extending integration to unbounded intervals and unbounded integrands — the Gamma, Beta, and Gaussian integrals that are workhorses in probability and machine learning

Abstract. The Riemann integral ∫ₐᵇ f(x) dx requires both the interval [a,b] and the function f to be bounded. Improper integrals remove these restrictions by defining ∫ₐ∞ f(x) dx = lim_{b→∞} ∫ₐᵇ f(x) dx (Type I: unbounded intervals) and handling unbounded integrands via one-sided limits at singularities (Type II). The p-test provides the fundamental benchmark: ∫₁∞ 1/xᵖ dx converges if and only if p > 1, while ∫₀¹ 1/xᵖ dx converges if and only if p < 1. Comparison tests — direct and limit — reduce new convergence questions to these benchmarks. Three special functions built from improper integrals pervade probability and machine learning. The Gamma function Γ(s) = ∫₀∞ tˢ⁻¹ e⁻ᵗ dt extends the factorial to real numbers via the functional equation Γ(s+1) = sΓ(s), with Γ(n+1) = n! for positive integers. The Beta function B(a,b) = ∫₀¹ tᵃ⁻¹(1-t)ᵇ⁻¹ dt is the normalizing constant for the Beta distribution, connected to Gamma by B(a,b) = Γ(a)Γ(b)/Γ(a+b). The Gaussian integral ∫₋∞∞ e⁻ˣ² dx = √π normalizes the Gaussian distribution — its proof via polar coordinates is a celebrated application of multivariable substitution. Stirling’s approximation n! ≈ √(2πn)(n/e)ⁿ, derived from the Gamma function, gives the asymptotic behavior of factorials used throughout information theory and combinatorics. In machine learning, these functions appear as normalizing constants for probability distributions (Gaussian, Gamma, Beta, Chi-squared, Student-t), in Bayesian posterior computation, and in the analysis of tail probabilities and concentration inequalities.

Where this leads → formalML

  • formalML Every probability density satisfies ∫ f(x) dx = 1, an improper integral. The Gamma and Beta functions are normalizing constants for the Gamma, Chi-squared, Beta, and Dirichlet distributions. The transition from improper Riemann integrals to Lebesgue integrals resolves convergence issues with conditional integrals and enables measure-theoretic probability.
  • formalML Bayesian posteriors require ∫ likelihood × prior dθ over unbounded parameter spaces. Conjugate priors (Gamma-Poisson, Beta-Binomial, Normal-Normal) are designed so these improper integrals reduce to ratios of Gamma and Beta functions, yielding closed-form posteriors.
  • formalML Differential entropy h(X) = -∫ f(x) log f(x) dx and KL divergence are improper integrals over ℝ. For the Gaussian, this evaluates to ½ log(2πeσ²) via the Gaussian integral. Stirling’s approximation gives the entropy of the binomial distribution: H(Bin(n,p)) ≈ ½ log(2πnp(1-p)).
  • formalML Tail bounds P(X > t) = ∫ᵗ∞ f(x) dx are improper integrals. The decay rate (exponential vs. polynomial tails) determines sub-Gaussian vs. heavy-tailed behavior. Stirling’s approximation appears in sharp bounds for binomial tails via the entropy method.

Overview & Motivation

You’re computing the normalizing constant for a Gaussian distribution: ex2/2dx\int_{-\infty}^{\infty} e^{-x^2/2}\,dx. This integral has no closed-form antiderivative — ex2/2e^{-x^2/2} has no elementary antiderivative at all. Yet the integral equals 2π\sqrt{2\pi}, a fact that makes the entire Gaussian probability framework possible. More immediately: every density f(x)f(x) must satisfy f(x)dx=1\int_{-\infty}^{\infty} f(x)\,dx = 1, but this is an integral over an unbounded interval — the Riemann integral from Topic 7 doesn’t apply directly.

Improper integrals make these computations rigorous by defining integrals over unbounded domains (and of unbounded functions) as limits of the proper integrals we’ve already built. The idea is simple: if you can’t integrate all the way to infinity, integrate to some large number bb and ask what happens as bb \to \infty. If the limit is finite, the improper integral converges and we call that limit the value of the integral. If not, the integral diverges.

We’ll extend the Riemann integral to unbounded intervals (§2) and unbounded integrands (§3), develop convergence tests that determine convergence without computing exact values (§4), then meet the three special functions that pervade probability and ML: the Gamma function (§5), the Beta function (§6), and the Gaussian integral (§7). Stirling’s approximation (§8) gives the asymptotic behavior of n!n! that appears throughout information theory and combinatorics.

Type I — Integration over Unbounded Intervals

Start with the picture: f(x)=1/x2f(x) = 1/x^2 on [1,)[1, \infty). The curve drops toward zero, and we want the total area under it from x=1x = 1 to the right forever. We can compute 1b1x2dx=11b\int_1^b \frac{1}{x^2}\,dx = 1 - \frac{1}{b} for any finite bb using the FTC. As bb \to \infty, this approaches 11. The area under an infinitely long curve is finite — this is the central surprise of improper integrals.

📐 Definition 1 (Type I Improper Integral)

If ff is Riemann integrable on [a,b][a, b] for every b>ab > a, we define

af(x)dx=limbabf(x)dx,\int_a^{\infty} f(x)\,dx = \lim_{b \to \infty} \int_a^b f(x)\,dx,

provided the limit exists. If the limit exists and is finite, the improper integral converges; otherwise it diverges.

Similarly, bf(x)dx=limaabf(x)dx\int_{-\infty}^b f(x)\,dx = \lim_{a \to -\infty} \int_a^b f(x)\,dx, and for doubly infinite integrals:

f(x)dx=cf(x)dx+cf(x)dx\int_{-\infty}^{\infty} f(x)\,dx = \int_{-\infty}^c f(x)\,dx + \int_c^{\infty} f(x)\,dx

for any cRc \in \mathbb{R}, provided both pieces converge. The choice of cc doesn’t matter when both parts converge — this follows from the additivity of the Riemann integral over adjacent intervals.

📝 Example 1 (The p-test for Type I)

The integral 11xpdx\int_1^{\infty} \frac{1}{x^p}\,dx is the fundamental benchmark for convergence at infinity.

For p1p \neq 1:

1b1xpdx=x1p1p1b=b1p11p.\int_1^b \frac{1}{x^p}\,dx = \frac{x^{1-p}}{1-p}\bigg|_1^b = \frac{b^{1-p} - 1}{1-p}.

  • If p>1p > 1: the exponent 1p<01 - p < 0, so b1p0b^{1-p} \to 0 as bb \to \infty. The integral converges to 1p1\frac{1}{p-1}.
  • If p<1p < 1: the exponent 1p>01 - p > 0, so b1pb^{1-p} \to \infty. The integral diverges.

For p=1p = 1: 1b1xdx=lnb\int_1^b \frac{1}{x}\,dx = \ln b \to \infty. Diverges.

The integral 11xpdx\int_1^{\infty} \frac{1}{x^p}\,dx converges if and only if p>1p > 1. This is the pp-test — every other Type I convergence question reduces to comparing against 1/xp1/x^p.

📝 Example 2 (Exponential decay)

0exdx=limb[ex]0b=limb(1eb)=1.\int_0^{\infty} e^{-x}\,dx = \lim_{b \to \infty} \left[-e^{-x}\right]_0^b = \lim_{b \to \infty} (1 - e^{-b}) = 1.

Exponential decay is more than fast enough to make the area finite. In fact, exe^{-x} decays faster than any power 1/xp1/x^p, which is why exponential tails are so well-behaved in probability.

📝 Example 3 (Harmonic divergence)

11xdx=limblnb=.\int_1^{\infty} \frac{1}{x}\,dx = \lim_{b \to \infty} \ln b = \infty.

The harmonic function 1/x1/x decays, but too slowly — the area accumulates without bound. This is the borderline case p=1p = 1 of the pp-test, and it’s the continuous analog of the divergent harmonic series 1/n\sum 1/n.

💡 Remark 1 (Doubly improper integrals and the Cauchy principal value)

For f(x)dx\int_{-\infty}^{\infty} f(x)\,dx, we split at any finite cc and require both cf\int_{-\infty}^c f and cf\int_c^{\infty} f to converge independently. The Cauchy principal value PVf(x)dx=limbbbf(x)dx\text{PV}\int_{-\infty}^{\infty} f(x)\,dx = \lim_{b \to \infty} \int_{-b}^{b} f(x)\,dx is a weaker notion — it can exist even when the improper integral diverges. For example, xdx\int_{-\infty}^{\infty} x\,dx has Cauchy principal value 00 (by symmetry: bbxdx=0\int_{-b}^{b} x\,dx = 0 for all bb), but the improper integral diverges because 0xdx=\int_0^{\infty} x\,dx = \infty.

∫₁ᵇ f(x) dx = 0.750000Exact: 1.000000 | Error: 2.50e-1

p = 2 > 1, so the integral converges to 1/(p−1) = 1

Type I convergence: 1/x² converges while 1/x diverges

Type II — Integration of Unbounded Functions

Now consider f(x)=1/xf(x) = 1/\sqrt{x} on (0,1](0, 1]. The function is unbounded near x=0x = 0 — it blows up to infinity. Yet the area under the curve from ε\varepsilon to 11 is 2(1ε)2(1 - \sqrt{\varepsilon}), which approaches 22 as ε0+\varepsilon \to 0^+. An infinitely tall region can have a finite area — the dual surprise to Type I.

📐 Definition 2 (Type II Improper Integral)

If ff is Riemann integrable on [ε,b][\varepsilon, b] for every ε(a,b]\varepsilon \in (a, b] but ff is unbounded near aa, we define

abf(x)dx=limεa+εbf(x)dx,\int_a^b f(x)\,dx = \lim_{\varepsilon \to a^+} \int_{\varepsilon}^b f(x)\,dx,

provided the limit exists. Similarly, if ff is unbounded near bb:

abf(x)dx=limεbabεf(x)dx.\int_a^b f(x)\,dx = \lim_{\varepsilon \to b^-} \int_a^{b - \varepsilon} f(x)\,dx.

For singularities at an interior point c(a,b)c \in (a, b), we split: abf=acf+cbf\int_a^b f = \int_a^c f + \int_c^b f, and both limits must exist independently.

📝 Example 4 (The p-test for Type II)

The integral 011xpdx\int_0^1 \frac{1}{x^p}\,dx is the benchmark for convergence near a singularity.

For p1p \neq 1:

ε11xpdx=x1p1pε1=1ε1p1p.\int_\varepsilon^1 \frac{1}{x^p}\,dx = \frac{x^{1-p}}{1-p}\bigg|_\varepsilon^1 = \frac{1 - \varepsilon^{1-p}}{1-p}.

  • If p<1p < 1: ε1p0\varepsilon^{1-p} \to 0 as ε0+\varepsilon \to 0^+ (since 1p>01 - p > 0). The integral converges to 11p\frac{1}{1-p}.
  • If p>1p > 1: ε1p\varepsilon^{1-p} \to \infty. The integral diverges.

For p=1p = 1: ε11xdx=lnε\int_\varepsilon^1 \frac{1}{x}\,dx = -\ln \varepsilon \to \infty. Diverges.

The integral 011xpdx\int_0^1 \frac{1}{x^p}\,dx converges if and only if p<1p < 1. Note the complementary relationship with the Type I pp-test: p>1p > 1 for convergence at infinity, p<1p < 1 for convergence at zero. The borderline p=1p = 1 diverges in both cases.

📝 Example 5 (Square root singularity)

011xdx\int_0^1 \frac{1}{\sqrt{x}}\,dx: here p=1/2<1p = 1/2 < 1, so the integral converges. Explicitly:

ε11xdx=2xε1=2(1ε)2 as ε0+.\int_\varepsilon^1 \frac{1}{\sqrt{x}}\,dx = 2\sqrt{x}\Big|_\varepsilon^1 = 2(1 - \sqrt{\varepsilon}) \to 2 \text{ as } \varepsilon \to 0^+.

The area under 1/x1/\sqrt{x} on (0,1](0, 1] is finite (=2= 2) despite the function blowing up at 00.

💡 Remark 2 (Interior singularities)

If ff has a singularity at c(a,b)c \in (a, b), we split: abf=acf+cbf\int_a^b f = \int_a^c f + \int_c^b f, and both parts must converge independently. For example:

111x2/3dx=101x2/3dx+011x2/3dx.\int_{-1}^{1} \frac{1}{|x|^{2/3}}\,dx = \int_{-1}^{0} \frac{1}{|x|^{2/3}}\,dx + \int_0^1 \frac{1}{x^{2/3}}\,dx.

Each piece converges (p=2/3<1p = 2/3 < 1), so the integral converges. The total value is 23=62 \cdot 3 = 6.

💡 Remark 3 (Both types simultaneously)

Some integrals involve both an unbounded interval and an unbounded integrand. For example, 01x(1+x)dx\int_0^{\infty} \frac{1}{\sqrt{x}(1+x)}\,dx has a Type II singularity at 00 (where 1/x1/\sqrt{x} blows up) and requires a Type I analysis as xx \to \infty (where the integrand decays like 1/x3/21/x^{3/2}). Split at x=1x = 1 and handle each piece separately: near 00, compare with 1/x1/\sqrt{x} (p=1/2<1p = 1/2 < 1, converges); at infinity, compare with 1/x3/21/x^{3/2} (p=3/2>1p = 3/2 > 1, converges). The integral converges and equals π\pi.

Type II convergence: 1/\u221Ax converges while 1/x diverges near 0

Convergence Tests

The pp-test is a powerful benchmark, but we need tools to determine convergence for integrals that aren’t exactly 1/xp1/x^p. The comparison tests reduce new convergence questions to known ones.

🔷 Theorem 1 (Comparison Test (Direct))

Suppose 0f(x)g(x)0 \le f(x) \le g(x) for all xax \ge a.

(a) If ag(x)dx\int_a^{\infty} g(x)\,dx converges, then af(x)dx\int_a^{\infty} f(x)\,dx converges, and afag\int_a^{\infty} f \le \int_a^{\infty} g.

(b) If af(x)dx\int_a^{\infty} f(x)\,dx diverges, then ag(x)dx\int_a^{\infty} g(x)\,dx diverges.

The analogous result holds for Type II integrals.

Proof.

For part (a): let F(b)=abf(x)dxF(b) = \int_a^b f(x)\,dx and G(b)=abg(x)dxG(b) = \int_a^b g(x)\,dx. By monotonicity of the Riemann integral (Topic 7, Theorem 3), fgf \le g implies F(b)G(b)F(b) \le G(b) for all b>ab > a. Since f0f \ge 0, the function FF is increasing in bb: adding more non-negative area can only increase the integral. Since G(b)ag<G(b) \to \int_a^\infty g < \infty, the function FF is increasing and bounded above. By the Monotone Convergence principle (Topic 3), limbF(b)\lim_{b \to \infty} F(b) exists and is finite.

Part (b) is the contrapositive of part (a): if g\int g converged, then f\int f would converge too, contradicting the hypothesis.

📝 Example 6 (Convergence by comparison)

111+x2dx\int_1^{\infty} \frac{1}{1+x^2}\,dx converges. For x1x \ge 1:

11+x21x2,\frac{1}{1+x^2} \le \frac{1}{x^2},

and 11x2dx\int_1^{\infty} \frac{1}{x^2}\,dx converges (p=2>1p = 2 > 1). By the comparison test, 111+x2dx\int_1^{\infty} \frac{1}{1+x^2}\,dx converges.

In fact, we can compute the exact value: 111+x2dx=arctan()arctan(1)=π2π4=π4\int_1^{\infty} \frac{1}{1+x^2}\,dx = \arctan(\infty) - \arctan(1) = \frac{\pi}{2} - \frac{\pi}{4} = \frac{\pi}{4}.

🔷 Theorem 2 (Limit Comparison Test)

Suppose f(x)>0f(x) > 0 and g(x)>0g(x) > 0 for xax \ge a, and limxf(x)g(x)=L\lim_{x \to \infty} \frac{f(x)}{g(x)} = L.

  • If 0<L<0 < L < \infty: af\int_a^{\infty} f and ag\int_a^{\infty} g either both converge or both diverge.
  • If L=0L = 0 and g\int g converges: then f\int f converges.
  • If L=L = \infty and g\int g diverges: then f\int f diverges.

Proof.

For the case 0<L<0 < L < \infty: choose ε=L/2\varepsilon = L/2. There exists MM such that for xMx \ge M:

L2<f(x)g(x)<3L2,\frac{L}{2} < \frac{f(x)}{g(x)} < \frac{3L}{2},

which gives L2g(x)<f(x)<3L2g(x)\frac{L}{2}g(x) < f(x) < \frac{3L}{2}g(x) for xMx \ge M. Apply the direct comparison test:

  • If g\int g converges: f(x)<3L2g(x)f(x) < \frac{3L}{2}g(x), so f\int f converges by comparison with 3L2g\frac{3L}{2}\int g.
  • If g\int g diverges: f(x)>L2g(x)f(x) > \frac{L}{2}g(x), so fL2g\int f \ge \frac{L}{2}\int g diverges.

The cases L=0L = 0 and L=L = \infty follow by similar comparison arguments using one-sided bounds.

📝 Example 7 (Limit comparison in action)

1xx3+1dx\int_1^{\infty} \frac{x}{x^3 + 1}\,dx converges. Compare with g(x)=1x2g(x) = \frac{1}{x^2}:

f(x)g(x)=x/(x3+1)1/x2=x3x3+11 as x.\frac{f(x)}{g(x)} = \frac{x/(x^3 + 1)}{1/x^2} = \frac{x^3}{x^3 + 1} \to 1 \text{ as } x \to \infty.

Since L=1(0,)L = 1 \in (0, \infty) and 11x2dx\int_1^{\infty} \frac{1}{x^2}\,dx converges, the limit comparison test gives convergence. We didn’t need to compute the integral — knowing its asymptotic behavior was enough.

📐 Definition 3 (Absolute and Conditional Convergence)

An improper integral af(x)dx\int_a^{\infty} f(x)\,dx converges absolutely if af(x)dx\int_a^{\infty} |f(x)|\,dx converges. It converges conditionally if af(x)dx\int_a^{\infty} f(x)\,dx converges but af(x)dx\int_a^{\infty} |f(x)|\,dx diverges.

🔷 Theorem 3 (Absolute Convergence Implies Convergence)

If af(x)dx\int_a^{\infty} |f(x)|\,dx converges, then af(x)dx\int_a^{\infty} f(x)\,dx converges, and

af(x)dxaf(x)dx.\left|\int_a^{\infty} f(x)\,dx\right| \le \int_a^{\infty} |f(x)|\,dx.

Proof.

Write f=f+ff = f^+ - f^- where f+(x)=max(f(x),0)f^+(x) = \max(f(x), 0) and f(x)=max(f(x),0)f^-(x) = \max(-f(x), 0). Then f=f++f|f| = f^+ + f^-, and 0f+f0 \le f^+ \le |f|, 0ff0 \le f^- \le |f|. Since f\int |f| converges, both f+\int f^+ and f\int f^- converge by the comparison test (Theorem 1). Then f=f+f\int f = \int f^+ - \int f^- converges.

The inequality follows: f=f+ff++f=f\left|\int f\right| = \left|\int f^+ - \int f^-\right| \le \int f^+ + \int f^- = \int |f|.

📝 Example 8 (The Dirichlet integral converges conditionally)

The Dirichlet integral 0sinxxdx=π2\int_0^{\infty} \frac{\sin x}{x}\,dx = \frac{\pi}{2} is a famous result (proved via contour integration or Laplace transforms — we state it without proof here). But 0sinxxdx\int_0^{\infty} \frac{|\sin x|}{x}\,dx diverges: on each interval [nπ,(n+1)π][n\pi, (n+1)\pi],

nπ(n+1)πsinxxdx1(n+1)πnπ(n+1)πsinxdx=2(n+1)π,\int_{n\pi}^{(n+1)\pi} \frac{|\sin x|}{x}\,dx \ge \frac{1}{(n+1)\pi}\int_{n\pi}^{(n+1)\pi} |\sin x|\,dx = \frac{2}{(n+1)\pi},

and n=02(n+1)π\sum_{n=0}^{\infty} \frac{2}{(n+1)\pi} diverges (harmonic series). The integral converges conditionally but not absolutely — positive and negative oscillations cancel just enough to produce a finite result, but the total variation is infinite.

💡 Remark 4 (Comparison tests for Type II)

The comparison and limit comparison tests apply to Type II improper integrals with the obvious modifications: compare behavior near the singularity instead of at infinity. For 01f(x)dx\int_0^1 f(x)\,dx with a singularity at 00, compare f(x)f(x) to 1/xp1/x^p as x0+x \to 0^+; the pp-test (p<1p < 1 for convergence) provides the benchmark.

\u222B f = 0.685730\u222B g = 0.900000Test: direct comparison \u2192 converges

Comparison test: direct comparison and limit comparison

Absolute vs. conditional convergence: the Dirichlet integral

The Gamma Function

The factorial n!=123nn! = 1 \cdot 2 \cdot 3 \cdots n is defined only for non-negative integers. Is there a smooth function that passes through the points (1,1),(2,1),(3,2),(4,6),(5,24),(1, 1), (2, 1), (3, 2), (4, 6), (5, 24), \ldots — that is, through (n+1,n!)(n+1, n!) — and extends the factorial to all positive real numbers? Euler found one: Γ(s)=0ts1etdt\Gamma(s) = \int_0^\infty t^{s-1} e^{-t}\,dt. This integral converges for s>0s > 0, and integration by parts yields Γ(s+1)=sΓ(s)\Gamma(s+1) = s\Gamma(s), which gives Γ(n+1)=n!\Gamma(n+1) = n! for positive integers.

📐 Definition 4 (The Gamma Function)

For s>0s > 0, define

Γ(s)=0ts1etdt.\Gamma(s) = \int_0^{\infty} t^{s-1} e^{-t}\,dt.

This is a doubly improper integral: the factor ts1t^{s-1} may blow up at t=0t = 0 (a Type II singularity when s<1s < 1), and the integral extends to t=t = \infty (Type I).

🔷 Proposition 1 (Convergence of the Gamma Integral)

Γ(s)\Gamma(s) converges for all s>0s > 0.

Near t=0t = 0: ts1etts1t^{s-1}e^{-t} \le t^{s-1} (since et1e^{-t} \le 1 for t0t \ge 0), and 01ts1dt\int_0^1 t^{s-1}\,dt converges if and only if s1>1s - 1 > -1, i.e., s>0s > 0. This is the Type II pp-test with p=1s<1p = 1 - s < 1.

Near t=t = \infty: For any s>0s > 0, the exponential decay ete^{-t} dominates the polynomial growth ts1t^{s-1}. Specifically, for tt large enough, ts1etet/2t^{s-1}e^{-t} \le e^{-t/2} (since ts1et/2t^{s-1} \le e^{t/2} for tt sufficiently large), and 1et/2dt=2e1/2\int_1^{\infty} e^{-t/2}\,dt = 2e^{-1/2} converges.

Proof.

Split the integral at t=1t = 1: Γ(s)=01ts1etdt+1ts1etdt\Gamma(s) = \int_0^1 t^{s-1}e^{-t}\,dt + \int_1^{\infty} t^{s-1}e^{-t}\,dt.

For the first piece: 0ts1etts10 \le t^{s-1}e^{-t} \le t^{s-1} on [0,1][0, 1] (since et1e^{-t} \le 1). The integral 01ts1dt=tss01=1s\int_0^1 t^{s-1}\,dt = \frac{t^s}{s}\big|_0^1 = \frac{1}{s} converges for s>0s > 0.

For the second piece: we need ts1et/20t^{s-1}e^{-t/2} \to 0 as tt \to \infty. Since limtts1/et/2=0\lim_{t \to \infty} t^{s-1}/e^{t/2} = 0 for any ss (exponential growth dominates polynomial), there exists T>1T > 1 such that ts1et/2t^{s-1} \le e^{t/2} for tTt \ge T. Then ts1etet/2t^{s-1}e^{-t} \le e^{-t/2} for tTt \ge T, and Tet/2dt=2eT/2<\int_T^{\infty} e^{-t/2}\,dt = 2e^{-T/2} < \infty. The integral 1Tts1etdt\int_1^T t^{s-1}e^{-t}\,dt is a proper Riemann integral (finite interval, bounded integrand), so it converges trivially.

🔷 Theorem 4 (Gamma Functional Equation)

For s>0s > 0: Γ(s+1)=sΓ(s)\Gamma(s+1) = s\,\Gamma(s).

Proof.

Integration by parts with u=tsu = t^s and dv=etdtdv = e^{-t}\,dt:

Γ(s+1)=0tsetdt=[tset]0+s0ts1etdt.\Gamma(s+1) = \int_0^{\infty} t^s e^{-t}\,dt = \left[-t^s e^{-t}\right]_0^{\infty} + s\int_0^{\infty} t^{s-1} e^{-t}\,dt.

The boundary term vanishes at both limits:

  • At t=0t = 0: tset0t^s e^{-t} \to 0 (for s>0s > 0).
  • At t=t = \infty: tset0t^s e^{-t} \to 0 (exponential decay dominates polynomial growth).

Therefore Γ(s+1)=0+sΓ(s)=sΓ(s)\Gamma(s+1) = 0 + s\,\Gamma(s) = s\,\Gamma(s).

🔷 Theorem 5 (Gamma and the Factorial)

For positive integers nn: Γ(n+1)=n!\Gamma(n+1) = n!. Also, Γ(1)=1\Gamma(1) = 1 and Γ(1/2)=π\Gamma(1/2) = \sqrt{\pi}.

Proof.

Base case: Γ(1)=0etdt=1\Gamma(1) = \int_0^{\infty} e^{-t}\,dt = 1.

Induction: By the functional equation, Γ(2)=1Γ(1)=1=1!\Gamma(2) = 1 \cdot \Gamma(1) = 1 = 1!, Γ(3)=2Γ(2)=2=2!\Gamma(3) = 2 \cdot \Gamma(2) = 2 = 2!, and generally Γ(n+1)=nΓ(n)=n(n1)!=n!\Gamma(n+1) = n \cdot \Gamma(n) = n \cdot (n-1)! = n!.

Half-integer value: Γ(1/2)=0t1/2etdt\Gamma(1/2) = \int_0^{\infty} t^{-1/2} e^{-t}\,dt. Substitute t=u2t = u^2, dt=2ududt = 2u\,du:

Γ(1/2)=0(u2)1/2eu22udu=20eu2du=π,\Gamma(1/2) = \int_0^{\infty} (u^2)^{-1/2} e^{-u^2} \cdot 2u\,du = 2\int_0^{\infty} e^{-u^2}\,du = \sqrt{\pi},

where the last equality is the Gaussian integral (Theorem 7 below).

📝 Example 9 (Half-integer values)

Using the functional equation Γ(s+1)=sΓ(s)\Gamma(s+1) = s\Gamma(s) repeatedly:

Γ(3/2)=12Γ(1/2)=π2,Γ(5/2)=32Γ(3/2)=3π4.\Gamma(3/2) = \frac{1}{2}\Gamma(1/2) = \frac{\sqrt{\pi}}{2}, \qquad \Gamma(5/2) = \frac{3}{2}\Gamma(3/2) = \frac{3\sqrt{\pi}}{4}.

In general, Γ(n+1/2)=(2n)!4nn!π\Gamma(n + 1/2) = \frac{(2n)!}{4^n n!}\sqrt{\pi}. These half-integer Gamma values appear in the surface area of spheres, the volume of balls in Rn\mathbb{R}^n, and the normalizing constants of the Student-tt and Chi-squared distributions.

💡 Remark 5 (The Bohr-Mollerup theorem)

The Gamma function is not just a factorial extension — it’s the factorial extension with the best analytic properties. The Bohr-Mollerup theorem states that Γ\Gamma is the unique function on (0,)(0, \infty) satisfying: (i) Γ(1)=1\Gamma(1) = 1, (ii) Γ(s+1)=sΓ(s)\Gamma(s+1) = s\,\Gamma(s), and (iii) logΓ\log \Gamma is convex. The log-convexity condition rules out other interpolations like f(s)=Γ(s)(1+εsin(2πs))f(s) = \Gamma(s)(1 + \varepsilon\sin(2\pi s)) that also satisfy (i) and (ii) but oscillate between the integer values.

\u0393(3.00) = 2.000000log \u0393 = 0.693147Near: Γ(3) = 2! = 2

The Gamma function and its integrand

The Beta Function

📐 Definition 5 (The Beta Function)

For a,b>0a, b > 0, define

B(a,b)=01ta1(1t)b1dt.B(a, b) = \int_0^1 t^{a-1}(1-t)^{b-1}\,dt.

This is a doubly improper integral when a<1a < 1 or b<1b < 1 (Type II singularities at t=0t = 0 or t=1t = 1). The integral converges for all a,b>0a, b > 0 by the Type II pp-test at each endpoint: near t=0t = 0, the integrand behaves like ta1t^{a-1}, and 01/2ta1dt\int_0^{1/2} t^{a-1}\,dt converges iff a>0a > 0; near t=1t = 1, it behaves like (1t)b1(1-t)^{b-1}, and 1/21(1t)b1dt\int_{1/2}^1 (1-t)^{b-1}\,dt converges iff b>0b > 0.

🔷 Theorem 6 (Beta-Gamma Relationship)

For all a,b>0a, b > 0:

B(a,b)=Γ(a)Γ(b)Γ(a+b).B(a, b) = \frac{\Gamma(a)\,\Gamma(b)}{\Gamma(a+b)}.

Proof.

Compute the product Γ(a)Γ(b)\Gamma(a)\,\Gamma(b) as a double integral:

Γ(a)Γ(b)=00sa1tb1e(s+t)dsdt.\Gamma(a)\,\Gamma(b) = \int_0^{\infty} \int_0^{\infty} s^{a-1} t^{b-1} e^{-(s+t)}\,ds\,dt.

Substitute s=uvs = uv, t=u(1v)t = u(1-v) with u(0,)u \in (0, \infty) and v(0,1)v \in (0, 1). The Jacobian of this transformation is (s,t)(u,v)=u\left|\frac{\partial(s,t)}{\partial(u,v)}\right| = u, so:

Γ(a)Γ(b)=001(uv)a1(u(1v))b1euudvdu.\Gamma(a)\,\Gamma(b) = \int_0^{\infty} \int_0^1 (uv)^{a-1}\bigl(u(1-v)\bigr)^{b-1} e^{-u} \cdot u\,dv\,du.

Separating:

=0ua+b1euduΓ(a+b)01va1(1v)b1dvB(a,b).= \underbrace{\int_0^{\infty} u^{a+b-1} e^{-u}\,du}_{\Gamma(a+b)} \cdot \underbrace{\int_0^1 v^{a-1}(1-v)^{b-1}\,dv}_{B(a,b)}.

Therefore Γ(a)Γ(b)=Γ(a+b)B(a,b)\Gamma(a)\,\Gamma(b) = \Gamma(a+b) \cdot B(a,b), which gives B(a,b)=Γ(a)Γ(b)Γ(a+b)B(a,b) = \frac{\Gamma(a)\,\Gamma(b)}{\Gamma(a+b)}.

(This proof uses Fubini’s theorem and a 2D change of variables — multivariable techniques that will be developed rigorously in Track 4. We preview them here because the result is too important to postpone.)

📝 Example 10 (B(1/2, 1/2) = pi)

B(1/2,1/2)=Γ(1/2)2Γ(1)=(π)21=π.B(1/2, 1/2) = \frac{\Gamma(1/2)^2}{\Gamma(1)} = \frac{(\sqrt{\pi})^2}{1} = \pi.

Direct verification: 011t(1t)dt=01dttt2\int_0^1 \frac{1}{\sqrt{t(1-t)}}\,dt = \int_0^1 \frac{dt}{\sqrt{t - t^2}}. Completing the square and substituting t=12(1+sinθ)t = \frac{1}{2}(1 + \sin\theta) gives π\pi.

📝 Example 11 (B(a, 1) and B(1, b))

B(a,1)=01ta1dt=1aB(a, 1) = \int_0^1 t^{a-1}\,dt = \frac{1}{a}.

Verify via the Gamma relationship: Γ(a)Γ(1)Γ(a+1)=Γ(a)1aΓ(a)=1a\frac{\Gamma(a)\,\Gamma(1)}{\Gamma(a+1)} = \frac{\Gamma(a) \cdot 1}{a\,\Gamma(a)} = \frac{1}{a}. Similarly, B(1,b)=1/bB(1, b) = 1/b.

💡 Remark 6 (The Beta distribution)

If XBeta(α,β)X \sim \text{Beta}(\alpha, \beta), its density is f(x)=xα1(1x)β1B(α,β)f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)} for x(0,1)x \in (0, 1). The Beta function is precisely the normalizing constant that makes 01f(x)dx=1\int_0^1 f(x)\,dx = 1. The parameters α,β>0\alpha, \beta > 0 control the shape: α=β=1\alpha = \beta = 1 gives the uniform distribution on [0,1][0, 1]; α=β1\alpha = \beta \gg 1 concentrates the density around x=1/2x = 1/2; α>β\alpha > \beta skews the density toward x=1x = 1.

In Bayesian inference, the Beta distribution is the conjugate prior for the Binomial likelihood. If you observe kk successes in nn trials and your prior is Beta(α,β)\text{Beta}(\alpha, \beta), the posterior is Beta(α+k,β+nk)\text{Beta}(\alpha + k, \beta + n - k). The marginal likelihood involves the ratio B(α+k,β+nk)/B(α,β)B(\alpha + k, \beta + n - k) / B(\alpha, \beta), which reduces to a ratio of Gamma functions.

Beta function integrand shapes and Beta distribution densities

The Gaussian Integral

The Gaussian integral is arguably the most important single integral in all of applied mathematics. It normalizes the Gaussian distribution, appears in the definition of Γ(1/2)\Gamma(1/2), gives the volume of the nn-sphere, and shows up in quantum mechanics, statistical physics, and signal processing.

🔷 Theorem 7 (The Gaussian Integral)

ex2dx=π.\int_{-\infty}^{\infty} e^{-x^2}\,dx = \sqrt{\pi}.

Equivalently, 0ex2dx=π2\int_0^{\infty} e^{-x^2}\,dx = \frac{\sqrt{\pi}}{2}.

Proof.

Let I=0ex2dxI = \int_0^{\infty} e^{-x^2}\,dx. The key trick is to compute I2I^2 as a double integral:

I2=(0ex2dx)(0ey2dy)=00e(x2+y2)dxdy.I^2 = \left(\int_0^{\infty} e^{-x^2}\,dx\right)\left(\int_0^{\infty} e^{-y^2}\,dy\right) = \int_0^{\infty}\int_0^{\infty} e^{-(x^2 + y^2)}\,dx\,dy.

Convert to polar coordinates: x=rcosθx = r\cos\theta, y=rsinθy = r\sin\theta, with Jacobian rr, and x2+y2=r2x^2 + y^2 = r^2:

I2=0π/20er2rdrdθ=π20rer2dr.I^2 = \int_0^{\pi/2}\int_0^{\infty} e^{-r^2}\,r\,dr\,d\theta = \frac{\pi}{2}\int_0^{\infty} r\,e^{-r^2}\,dr.

The inner integral evaluates by substituting u=r2u = r^2, du=2rdrdu = 2r\,dr:

0rer2dr=120eudu=12.\int_0^{\infty} r\,e^{-r^2}\,dr = \frac{1}{2}\int_0^{\infty} e^{-u}\,du = \frac{1}{2}.

Therefore I2=π212=π4I^2 = \frac{\pi}{2} \cdot \frac{1}{2} = \frac{\pi}{4}, so I=π2I = \frac{\sqrt{\pi}}{2}, and ex2dx=2I=π\int_{-\infty}^{\infty} e^{-x^2}\,dx = 2I = \sqrt{\pi}.

(This proof uses a double integral and polar coordinates — multivariable tools that will be developed rigorously in Track 4. We preview them here because the result is too important and the proof too elegant to postpone. The reader can follow the geometry: squaring the integral converts a 1D problem into a 2D problem where the radial symmetry of e(x2+y2)e^{-(x^2+y^2)} makes the integral tractable.)

🔷 Proposition 2 (Gaussian Integral Variants)

(a) For a>0a > 0: eax2dx=π/a\int_{-\infty}^{\infty} e^{-ax^2}\,dx = \sqrt{\pi/a}. (Substitute u=axu = \sqrt{a}\,x.)

(b) Completing the square: eax2+bxdx=π/aeb2/(4a)\int_{-\infty}^{\infty} e^{-ax^2 + bx}\,dx = \sqrt{\pi/a}\,e^{b^2/(4a)}.

(c) Even moments: x2nex2dx=(2n)!4nn!π\int_{-\infty}^{\infty} x^{2n} e^{-x^2}\,dx = \frac{(2n)!}{4^n n!}\sqrt{\pi}.

These follow from (a) by differentiation with respect to aa (for the moments) or completing the square in the exponent.

📝 Example 12 (Normalizing the Gaussian density)

The Gaussian density is f(x)=1σ2πe(xμ)2/(2σ2)f(x) = \frac{1}{\sigma\sqrt{2\pi}}\,e^{-(x-\mu)^2/(2\sigma^2)}. We verify f(x)dx=1\int_{-\infty}^{\infty} f(x)\,dx = 1:

Substitute u=(xμ)/(σ2)u = (x - \mu)/(\sigma\sqrt{2}), so dx=σ2dudx = \sigma\sqrt{2}\,du:

f(x)dx=1σ2πσ2eu2du=22ππ=1.\int_{-\infty}^{\infty} f(x)\,dx = \frac{1}{\sigma\sqrt{2\pi}} \cdot \sigma\sqrt{2} \int_{-\infty}^{\infty} e^{-u^2}\,du = \frac{\sqrt{2}}{\sqrt{2\pi}} \cdot \sqrt{\pi} = 1. \quad \checkmark

The 2π\sqrt{2\pi} in the denominator of the Gaussian density is there precisely because ex2dx=π\int e^{-x^2}\,dx = \sqrt{\pi} — it’s the Gaussian integral doing the normalizing.

💡 Remark 7 (The error function and Gaussian CDF)

The error function erf(x)=2π0xet2dt\text{erf}(x) = \frac{2}{\sqrt{\pi}}\int_0^x e^{-t^2}\,dt is the normalized incomplete Gaussian integral. The Gaussian CDF is:

Φ(x)=12[1+erf(x2)].\Phi(x) = \frac{1}{2}\left[1 + \text{erf}\left(\frac{x}{\sqrt{2}}\right)\right].

The error function has no closed-form antiderivative — et2e^{-t^2} cannot be integrated in terms of elementary functions. But it can be computed numerically to machine precision by standard libraries (scipy.special.erf, torch.special.erf). The rapid convergence of the Gaussian tail (Φ(x)11x2πex2/2\Phi(x) \approx 1 - \frac{1}{x\sqrt{2\pi}}e^{-x^2/2} for large xx, known as Mill’s ratio) is what makes sub-Gaussian concentration inequalities so powerful.

e^(-x²) dx = 1.772415Exact \u221A(\u03C0/a) = 1.772454 | Error: 3.92e-5

The Gaussian integral and polar coordinates proof

Stirling’s Approximation

🔷 Theorem 8 (Stirling's Approximation)

n!2πn(ne)nas n.n! \sim \sqrt{2\pi n}\left(\frac{n}{e}\right)^n \quad \text{as } n \to \infty.

More precisely: limnn!2πn(n/e)n=1\lim_{n \to \infty} \frac{n!}{\sqrt{2\pi n}(n/e)^n} = 1. The relative error is O(1/n)O(1/n):

n!=2πn(ne)n(1+112n+O(1/n2)).n! = \sqrt{2\pi n}\left(\frac{n}{e}\right)^n \left(1 + \frac{1}{12n} + O(1/n^2)\right).

Proof.

We sketch the proof via the Gamma function and Laplace’s method. Write n!=Γ(n+1)=0tnetdtn! = \Gamma(n+1) = \int_0^{\infty} t^n e^{-t}\,dt. The integrand h(t)=tneth(t) = t^n e^{-t} has its maximum at t=nt = n (set h(t)=0h'(t) = 0: ntn1ettnet=0nt^{n-1}e^{-t} - t^n e^{-t} = 0 gives t=nt = n).

Substitute t=n+nut = n + \sqrt{n}\,u to center the integrand at its maximum. Taylor-expanding logh(t)\log h(t) around t=nt = n:

logh(n+nu)=logh(n)+0(nu)u22+O(u3/n).\log h(n + \sqrt{n}\,u) = \log h(n) + 0 \cdot (\sqrt{n}\,u) - \frac{u^2}{2} + O(u^3/\sqrt{n}).

So h(n+nu)h(n)eu2/2h(n + \sqrt{n}\,u) \approx h(n) \cdot e^{-u^2/2} for the leading term. Integrating:

Γ(n+1)h(n)neu2/2du=nnenn2π=2πn(ne)n.\Gamma(n+1) \approx h(n) \cdot \sqrt{n} \int_{-\infty}^{\infty} e^{-u^2/2}\,du = n^n e^{-n} \cdot \sqrt{n} \cdot \sqrt{2\pi} = \sqrt{2\pi n}\left(\frac{n}{e}\right)^n.

The Laplace approximation replaces the integrand by a Gaussian centered at its maximum — the 2π\sqrt{2\pi} factor is precisely the Gaussian integral. The correction terms (the 1/(12n)1/(12n) and higher) come from including the cubic and higher Taylor terms of logh\log h.

📝 Example 13 (Numerical verification)

nnn!n!StirlingRelative error
5120118.021.65%
103,628,8003,598,6960.83%
202.433×10182.433 \times 10^{18}2.423×10182.423 \times 10^{18}0.42%
503.041×10643.041 \times 10^{64}3.036×10643.036 \times 10^{64}0.17%
1009.333×101579.333 \times 10^{157}9.325×101579.325 \times 10^{157}0.083%

The approximation improves as nn increases — the relative error decreases like 1/(12n)1/(12n).

💡 Remark 8 (Stirling in log form)

In practice, the most useful form is:

log(n!)nlognn+12log(2πn).\log(n!) \approx n\log n - n + \frac{1}{2}\log(2\pi n).

This is the form used throughout information theory. The key application: log(nk)=log(n!)log(k!)log((nk)!)\log\binom{n}{k} = \log(n!) - \log(k!) - \log((n-k)!). Applying Stirling to each factorial:

log(nk)nH(k/n),\log\binom{n}{k} \approx nH(k/n),

where H(p)=plogp(1p)log(1p)H(p) = -p\log p - (1-p)\log(1-p) is the binary entropy function. Stirling’s approximation converts combinatorial counting problems into entropy calculations — this is the foundation of the “method of types” in information theory.

20! = 2.4329e+18Stirling: 2.4228e+18Relative error: 0.418%

Stirling's approximation vs. n! on log scale

Computational Notes

The special functions from this topic have production-quality numerical implementations in every scientific computing library.

Improper integrals: scipy.integrate.quad handles improper integrals directly — pass np.inf as the upper limit. It uses adaptive Gauss-Kronrod quadrature with automatic singularity detection. For most well-behaved integrands, it achieves machine precision (1015\sim 10^{-15} relative error).

from scipy.integrate import quad
import numpy as np

# Type I: integral from 1 to infinity of 1/x^2
val, err = quad(lambda x: 1/x**2, 1, np.inf)  # val = 1.0

# Type II: integral from 0 to 1 of 1/sqrt(x)
val, err = quad(lambda x: 1/np.sqrt(x), 0, 1)  # val = 2.0

Special functions: scipy.special provides gamma, beta, erf, and dozens more. For numerical stability, always use scipy.special.gammaln (log-Gamma) instead of computing Γ(n)\Gamma(n) directly — Γ(n)\Gamma(n) overflows double precision for n>171n > 171, but logΓ(n)\log\Gamma(n) is representable for much larger nn.

from scipy.special import gamma, gammaln, beta, erf
import math

gamma(5)      # 24.0 (= 4!)
gammaln(1000) # 5905.22... (log(999!) — doesn't overflow)
beta(2, 3)    # 0.08333... (= 1/12)
erf(1.0)      # 0.8427...

Rule of thumb: Never compute Γ(n)\Gamma(n) or n!n! directly for large nn. Always work with logΓ\log\Gamma and exponentiate only at the end, if needed.

Numerical convergence of truncated improper integrals

Connections to ML

The special functions from this topic are the computational backbone of probability distributions in machine learning. We make the connections explicit.

Normalizing constants for probability distributions

Every probability density f(x)f(x) satisfies f=1\int f = 1. For the standard parametric families, the normalizing constant is a special function from this topic:

DistributionDensity kernelNormalizing constantSpecial function
N(μ,σ2)\mathcal{N}(\mu, \sigma^2)e(xμ)2/(2σ2)e^{-(x-\mu)^2/(2\sigma^2)}σ2π\sigma\sqrt{2\pi}Gaussian integral
Gamma(α,β)\text{Gamma}(\alpha, \beta)xα1eβxx^{\alpha-1}e^{-\beta x}Γ(α)/βα\Gamma(\alpha)/\beta^\alphaGamma function
Beta(α,β)\text{Beta}(\alpha, \beta)xα1(1x)β1x^{\alpha-1}(1-x)^{\beta-1}B(α,β)B(\alpha, \beta)Beta function
χk2\chi^2_kxk/21ex/2x^{k/2-1}e^{-x/2}2k/2Γ(k/2)2^{k/2}\Gamma(k/2)Gamma function
tνt_\nu (Student)(1+x2/ν)(ν+1)/2(1 + x^2/\nu)^{-(\nu+1)/2}Γ((ν+1)/2)Γ(ν/2)νπ\frac{\Gamma((\nu+1)/2)}{\Gamma(\nu/2)\sqrt{\nu\pi}}Gamma function

Every time you write down a Gaussian, Gamma, or Beta distribution in a model, you’re implicitly using the improper integrals from this topic.

Forward link: Measure-Theoretic Probability on formalML develops the Lebesgue integral framework that makes these normalizing constants rigorous.

Bayesian posterior computation

Conjugate priors are chosen so that likelihood×priordθ\int \text{likelihood} \times \text{prior}\,d\theta reduces to a ratio of special functions:

  • Beta-Binomial: Prior Beta(α,β)\text{Beta}(\alpha, \beta), posterior Beta(α+k,β+nk)\text{Beta}(\alpha + k, \beta + n - k). The marginal likelihood is B(α+k,β+nk)B(α,β)\frac{B(\alpha + k, \beta + n - k)}{B(\alpha, \beta)} — a ratio of Beta functions.
  • Gamma-Poisson: Prior Gamma(α,β)\text{Gamma}(\alpha, \beta), posterior Gamma(α+xi,β+n)\text{Gamma}(\alpha + \sum x_i, \beta + n). Again, ratios of Gamma functions.
  • Normal-Normal: Posterior precision is the sum of prior and likelihood precisions. The normalizing constant involves 2π\sqrt{2\pi}.

When conjugacy is unavailable, these integrals must be computed numerically via MCMC or variational inference.

Forward link: Bayesian Nonparametrics on formalML develops the general Bayesian inference framework.

Stirling’s approximation in information theory

The entropy of the binomial distribution Bin(n,p)\text{Bin}(n, p) is:

H12log(2πnp(1p))H \approx \frac{1}{2}\log(2\pi n p(1-p))

via Stirling applied to (nk)\binom{n}{k}. The “method of types” uses log(nk)nH(k/n)\log\binom{n}{k} \approx nH(k/n) to count the number of binary strings with a given empirical frequency. Stirling converts combinatorial counting into entropy maximization.

Forward link: Shannon Entropy on formalML develops the full information-theoretic framework.

Tail probabilities and concentration

The tail probability P(X>t)=tf(x)dxP(X > t) = \int_t^{\infty} f(x)\,dx is an improper integral. For a standard Gaussian:

P(X>t)=12erfc(t2)1t2πet2/2P(X > t) = \frac{1}{2}\text{erfc}\left(\frac{t}{\sqrt{2}}\right) \le \frac{1}{t\sqrt{2\pi}}e^{-t^2/2}

(Mill’s ratio bound). The tail decay rate determines whether the distribution is sub-Gaussian, sub-exponential, or heavy-tailed, and thereby governs the strength of available concentration inequalities.

Forward link: Concentration Inequalities on formalML develops sub-Gaussian and sub-exponential tail bounds.

ML connections: normalizing constants and Bayesian posteriors

Connections & Further Reading

Prerequisites (back-references)

Forward references (within formalCalculus)

  • Series Convergence & Tests — the integral test for series convergence uses improper integrals: n=1f(n)\sum_{n=1}^\infty f(n) converges iff 1f(x)dx\int_1^\infty f(x)\,dx converges.
  • Power Series & Taylor Series — power series with infinite radius of convergence, the Gamma function’s Weierstrass product.
  • Fourier Series & Orthogonal Expansions — Fourier coefficients as integrals, the Riemann-Lebesgue lemma.
  • Multiple Integrals & Fubini’s Theorem — the Gaussian integral proof in full multivariable rigor, Fubini’s theorem for improper double integrals.
  • Change of Variables & the Jacobian Determinant — polar coordinates justified via the Jacobian determinant; the Gaussian integral proved in full.
  • The Lebesgue Integral (coming soon) — resolving issues with conditional convergence; the dominated convergence theorem handles limits that the Riemann framework cannot.
  • LpL^p Spaces (coming soon)LpL^p norms fp=(fp)1/p\|f\|_p = \left(\int |f|^p\right)^{1/p} as improper integrals, the Gamma function in unit ball volumes.

References

  1. book Abbott (2015). Understanding Analysis Chapter 7.4 covers improper integrals as an extension of the Riemann integral — our primary reference for the convergence theory and comparison tests
  2. book Rudin (1976). Principles of Mathematical Analysis Chapter 8 develops special functions including the Gamma function with characteristic concision — useful for the functional equation and Stirling’s approximation
  3. book Spivak (2008). Calculus Chapter 18 on improper integrals with geometric motivation, and Chapter 19 on the Gamma function — the best reference for combining rigor with intuition
  4. book Graham, Knuth & Patashnik (1994). Concrete Mathematics Chapter 9 develops Stirling’s approximation with detailed asymptotics — the most thorough treatment of the factorial’s asymptotic behavior
  5. book Folland (1999). Real Analysis Chapter 2 on the Lebesgue integral — useful for understanding where improper Riemann integration fails and why the Lebesgue framework handles these issues naturally
  6. book Bishop (2006). Pattern Recognition and Machine Learning Appendix B collects the special function identities (Gamma, Beta, Gaussian) used throughout Bayesian ML — the ML practitioner’s reference for these functions