Series & Approximation · intermediate · 50 min read

Power Series & Taylor Series

When do infinite polynomial expansions converge — and when can you differentiate and integrate them term by term? The bridge from finite Taylor approximations to infinite series representations, with the radius of convergence as the gatekeeper.

Abstract. A power series ∑ aₙ(x−c)ⁿ is an infinite polynomial — a series whose terms are functions of x rather than fixed numbers. The Cauchy-Hadamard theorem determines a radius of convergence R = 1/limsup|aₙ|^(1/n): the series converges absolutely for |x−c| < R and diverges for |x−c| > R. At the endpoints x = c ± R, convergence must be tested case by case using the convergence tests from Series Convergence & Tests. Inside the radius of convergence, a power series converges uniformly on every compact subset, which — via the interchange theorems from Uniform Convergence — justifies term-by-term differentiation and integration: the derivative of a power series is the series of derivatives, and the integral of a power series is the series of integrals, both with the same radius R. This makes power series infinitely differentiable inside their radius of convergence. A Taylor series ∑ f⁽ⁿ⁾(c)/n! · (x−c)ⁿ is a power series whose coefficients are determined by the derivatives of f at the center c. For analytic functions — those whose Taylor series converges to f in a neighborhood of c — the Taylor series provides a complete representation. But smooth does not imply analytic: the function e^(−1/x²) is C^∞ at the origin with all derivatives zero, yet is not identically zero, so its Taylor series converges to the wrong function. The Taylor series catalog (eˣ, sin x, cos x, ln(1+x), 1/(1−x), the binomial series (1+x)^α) forms the backbone of local approximation in both pure mathematics and applied ML. In machine learning, Taylor expansions appear in the descent lemma for gradient descent convergence, the Laplace approximation of posterior distributions, GELU activation function computation, and the matrix exponential for continuous-time dynamical models.

Where this leads → formalML

  • formalML The descent lemma f(y) ≤ f(x) + ∇f(x)ᵀ(y−x) + L/2·‖y−x‖² is a second-order Taylor expansion with L-Lipschitz gradient remainder bound. Newton's method replaces gradient descent's linear Taylor model with the full quadratic Taylor model T₂(x), achieving quadratic convergence near optima.
  • formalML A twice-differentiable function is convex iff its first-order Taylor expansion is a global lower bound: f(y) ≥ f(x) + ∇f(x)ᵀ(y−x). This characterization follows directly from Taylor's theorem with non-negative second derivative.
  • formalML The Fisher information matrix I(θ) is the Hessian of the KL divergence at θ = θ₀, computed via second-order Taylor expansion. The resulting Riemannian metric on parameter space is the foundation of natural gradient methods.
  • formalML The smooth-vs-analytic distinction developed here is foundational: analytic functions are locally representable by convergent power series in coordinate charts, while smooth functions form the more general C^∞ category used in differential geometry.

1. Overview & Motivation — From Finite to Infinite

Mean Value Theorem & Taylor Expansion gave us Taylor polynomials — finite sums Tn(x)=k=0nf(k)(a)k!(xa)kT_n(x) = \sum_{k=0}^{n} \frac{f^{(k)}(a)}{k!}(x-a)^k that approximate ff near aa. Series Convergence & Tests gave us convergence tests for infinite sums of numbers. What happens when we let nn \to \infty in the Taylor polynomial — when does the infinite sum converge, and does it converge to ff?

More generally: what happens when the terms of a series depend on xx? The geometric series n=0xn=11x\sum_{n=0}^{\infty} x^n = \frac{1}{1-x} already demonstrated this in Series Convergence & Tests — it converges for x<1|x| < 1 and diverges for x1|x| \geq 1. Power series generalize this pattern.

Why this matters in ML. Neural network activation functions are often approximated by truncated Taylor series. The GELU activation GELU(x)=xΦ(x)\text{GELU}(x) = x \cdot \Phi(x) is implemented in practice as 0.5x(1+tanh(2/π(x+0.044715x3)))0.5x(1 + \tanh(\sqrt{2/\pi}(x + 0.044715x^3))), which is a polynomial approximation derived from a Taylor expansion. Understanding when and where such truncations are valid requires the theory of power series convergence.

This topic sits at the intersection of three prerequisites. Series Convergence & Tests provides the convergence tests (ratio, root) that determine the radius of convergence. Uniform Convergence provides the uniform convergence theory that justifies term-by-term calculus. Mean Value Theorem & Taylor Expansion provides the Taylor polynomial machinery whose infinite extension we now analyze.

Power series overview — three behaviors: convergence inside R, divergence outside R, and the entire-function case R = ∞

2. Power Series — Definition and First Examples

📐 Definition 1 (Power Series)

A power series centered at cc is an expression of the form

n=0an(xc)n=a0+a1(xc)+a2(xc)2+\sum_{n=0}^{\infty} a_n (x - c)^n = a_0 + a_1(x-c) + a_2(x-c)^2 + \cdots

where a0,a1,a2,a_0, a_1, a_2, \ldots are real constants called the coefficients and cc is the center. The special case c=0c = 0 gives anxn\sum a_n x^n.

A power series is not a single number — it is a function of xx. For each value of xx, we get a numerical series an(xc)n\sum a_n (x-c)^n that may converge or diverge. The central question of this topic is: for which values of xx does the series converge?

📝 Example 1 (The geometric series as a power series)

n=0xn\sum_{n=0}^{\infty} x^n has an=1a_n = 1 for all nn and center c=0c = 0. From Series Convergence & Tests, this converges to 11x\frac{1}{1-x} for x<1|x| < 1 and diverges for x1|x| \geq 1. This is the prototype: convergence on an open interval, divergence outside it.

📝 Example 2 (The exponential series (R = ∞))

n=0xnn!\sum_{n=0}^{\infty} \frac{x^n}{n!} has an=1/n!a_n = 1/n! and converges for all xRx \in \mathbb{R}. From Mean Value Theorem & Taylor Expansion, we know this equals exe^x. The ratio of consecutive terms is x/(n+1)0|x|/(n+1) \to 0 for every fixed xx, so the ratio test gives convergence everywhere.

📝 Example 3 (A series that converges only at its center (R = 0))

n=0n!xn\sum_{n=0}^{\infty} n! \, x^n diverges for every x0x \neq 0. The ratio an+1xn+1/(anxn)=(n+1)x|a_{n+1} x^{n+1}/(a_n x^n)| = (n+1)|x| \to \infty for any fixed x0x \neq 0. The “radius of convergence” is 00 — this series is useless as a function of xx.

💡 Remark 1 (Power series generalize polynomials)

A polynomial of degree NN is a power series with an=0a_n = 0 for n>Nn > N. It converges everywhere (R=R = \infty). Power series extend this to “infinite-degree polynomials,” but the trade-off is that convergence is no longer automatic.

3. Radius of Convergence

The three examples above illustrate a remarkable structural fact: a power series always converges on an interval centered at cc. The half-width of this interval is the radius of convergence — and it is determined entirely by the coefficients.

🔷 Theorem 1 (Existence of the Radius of Convergence)

For any power series an(xc)n\sum a_n (x - c)^n, exactly one of the following holds:

(i) The series converges only at x=cx = c (R=0R = 0).

(ii) The series converges for all xRx \in \mathbb{R} (R=R = \infty).

(iii) There exists R>0R > 0 such that the series converges absolutely for xc<R|x - c| < R and diverges for xc>R|x - c| > R.

Proof.

Suppose an(x0c)n\sum a_n (x_0 - c)^n converges at some x0cx_0 \neq c. Then an(x0c)n0a_n(x_0 - c)^n \to 0 (by the divergence test from Series Convergence & Tests), so an(x0c)nM|a_n(x_0 - c)^n| \leq M for some bound MM. For any xx with xc<x0c|x - c| < |x_0 - c|, set r=xc/x0c<1r = |x - c|/|x_0 - c| < 1. Then

an(xc)n=an(x0c)nrnMrn.|a_n(x-c)^n| = |a_n(x_0-c)^n| \cdot r^n \leq M r^n.

Since Mrn\sum M r^n converges (geometric series with r<1r < 1), the comparison test gives absolute convergence at xx.

Now define R=sup{x0c:an(x0c)n converges}R = \sup\{|x_0 - c| : \sum a_n(x_0-c)^n \text{ converges}\}. If the series converges only at cc, then R=0R = 0 (case i). If the supremum is infinite, then R=R = \infty (case ii). Otherwise, RR is a positive real number (case iii): the argument above shows convergence for xc<R|x - c| < R, and divergence for xc>R|x - c| > R follows because if the series converged at some x1x_1 with x1c>R|x_1 - c| > R, it would also converge at all xx with xc<x1c|x - c| < |x_1 - c|, contradicting R=supR = \sup.

📐 Definition 2 (Radius of Convergence)

The number RR from Theorem 1 is the radius of convergence of an(xc)n\sum a_n(x-c)^n. We allow R=0R = 0 and R=R = \infty. The open interval (cR,c+R)(c-R, c+R) is the open interval of convergence.

How do we compute RR? The root and ratio tests from Series Convergence & Tests, applied to the coefficient sequence, give explicit formulas.

🔷 Theorem 2 (The Cauchy-Hadamard Formula)

1R=lim supnan1/n\frac{1}{R} = \limsup_{n \to \infty} |a_n|^{1/n}

with the convention 1/0=1/0 = \infty and 1/=01/\infty = 0.

Proof.

Apply the root test from Series Convergence & Tests to an(xc)n\sum |a_n(x-c)^n|:

lim supnan(xc)n1/n=xclim supnan1/n.\limsup_{n \to \infty} |a_n(x-c)^n|^{1/n} = |x-c| \cdot \limsup_{n \to \infty} |a_n|^{1/n}.

The root test gives convergence when this is <1< 1, i.e., xc<1/lim supan1/n=R|x-c| < 1/\limsup |a_n|^{1/n} = R, and divergence when >1> 1, i.e., xc>R|x-c| > R.

🔷 Theorem 3 (Ratio Test for Radius)

If limnan+1/an\lim_{n \to \infty} |a_{n+1}/a_n| exists (possibly =0= 0 or == \infty), then

R=limnanan+1.R = \lim_{n \to \infty} \left|\frac{a_n}{a_{n+1}}\right|.

💡 Remark 2 (Root test vs. ratio test)

The Cauchy-Hadamard formula always works (it uses limsup). The ratio test requires the limit to exist. When both apply, they give the same RR. From Series Convergence & Tests, the root test is strictly stronger — there exist series where the ratio test is inconclusive but the root test determines RR.

📝 Example 4 (Computing R for standard series)

(a) xn/n!\sum x^n/n!: ratio gives R=limn(n+1)=R = \lim_{n \to \infty} (n+1) = \infty.

(b) n!xn\sum n! \, x^n: ratio gives R=limn1/(n+1)=0R = \lim_{n \to \infty} 1/(n+1) = 0.

(c) xn/n\sum x^n/n: ratio gives R=limnn/(n+1)=1R = \lim_{n \to \infty} n/(n+1) = 1.

(d) xn/n2\sum x^n/n^2: ratio gives R=limnn2/(n+1)2=1R = \lim_{n \to \infty} n^2/(n+1)^2 = 1.

(e) nnxn\sum n^n x^n: Cauchy-Hadamard gives 1/R=lim supn=1/R = \limsup n = \infty, so R=0R = 0.

The explorer below lets you see the diagnostic sequences an1/n|a_n|^{1/n} and an+1/an|a_{n+1}/a_n| converging to 1/R1/R, and probe what happens when you evaluate the series at points inside and outside the radius.

Diagnostic: |aₙ₊₁/aₙ| 0.0182R = At x = 0.500: converges · S60 = 1.648721

Radius of convergence — Cauchy-Hadamard and ratio test diagnostics converging to 1/R

4. Endpoint Behavior — Where the Tests from Topic 17 Come to Work

The radius RR determines convergence on the open interval (cR,c+R)(c-R, c+R) and divergence outside [cR,c+R][c-R, c+R]. But at the endpoints x=c±Rx = c \pm R themselves, the power series becomes a numerical series — and you must test it directly using the convergence toolkit from Series Convergence & Tests.

📐 Definition 3 (Interval of Convergence)

The interval of convergence of an(xc)n\sum a_n(x-c)^n is the set of all xx where the series converges. It always includes the open interval (cR,c+R)(c-R, c+R) and may or may not include either endpoint.

📝 Example 5 (Three endpoint behaviors)

All three of the following have R=1R = 1 centered at c=0c = 0:

(a) xn\sum x^n: diverges at both endpoints. At x=1x = 1: 1\sum 1 diverges. At x=1x = -1: (1)n\sum (-1)^n diverges. Interval: (1,1)(-1, 1).

(b) xn/n2\sum x^n/n^2: converges at both endpoints. At x=1x = 1: 1/n2\sum 1/n^2 converges (pp-series, p=2p = 2). At x=1x = -1: (1)n/n2\sum (-1)^n/n^2 converges absolutely. Interval: [1,1][-1, 1].

(c) xn/n\sum x^n/n: mixed. At x=1x = -1: (1)n/n\sum (-1)^n/n converges by the alternating series (Leibniz) test. At x=1x = 1: 1/n\sum 1/n diverges (harmonic). Interval: [1,1)[-1, 1).

💡 Remark 3 (The four endpoint possibilities)

With two endpoints and two possible verdicts (converge/diverge) at each, there are four combinations: (cR,c+R)(c-R, c+R), [cR,c+R][c-R, c+R], [cR,c+R)[c-R, c+R), (cR,c+R](c-R, c+R]. All four occur in practice. The algorithm is always the same: (1) compute RR via ratio or Cauchy-Hadamard; (2) substitute x=c±Rx = c \pm R and apply a convergence test from Series Convergence & Tests.

📝 Example 6 (Endpoint analysis with comparison and alternating series tests)

For (1)nxnn+1\sum \frac{(-1)^n x^n}{\sqrt{n+1}}, the ratio test gives R=1R = 1. At x=1x = 1: (1)n/n+1\sum (-1)^n/\sqrt{n+1} converges by the Leibniz test (alternating, terms decrease to 00). At x=1x = -1: 1/n+1\sum 1/\sqrt{n+1} diverges by comparison with 1/n\sum 1/\sqrt{n} (pp-series with p=1/2<1p = 1/2 < 1). Interval: (1,1](-1, 1].

Endpoint behavior — the four possible interval types demonstrated by three standard series

5. Uniform Convergence on Compact Subsets

This is the section where the threads come together. We connect power series to the uniform convergence theory from Uniform Convergence, establishing the key property that makes everything in the next section work.

🔷 Theorem 4 (Uniform Convergence on Compact Subsets)

If an(xc)n\sum a_n(x-c)^n has radius of convergence R>0R > 0, then the series converges uniformly on every closed interval [cr,c+r][c-r, c+r] for 0<r<R0 < r < R.

Proof.

Fix rr with 0<r<R0 < r < R and let Mn=anrnM_n = |a_n| r^n. For xcr|x - c| \leq r, we have

an(xc)nanrn=Mn.|a_n(x-c)^n| \leq |a_n| r^n = M_n.

Since r<Rr < R, the series Mn=anrn\sum M_n = \sum |a_n| r^n converges (by the definition of RR — the power series converges absolutely for xc<R|x - c| < R, and r<Rr < R). By the Weierstrass M-test from Uniform Convergence, the series an(xc)n\sum a_n(x-c)^n converges uniformly on [cr,c+r][c-r, c+r].

💡 Remark 4 (Why compact subsets, not the full interval)

The power series xn=1/(1x)\sum x^n = 1/(1-x) converges pointwise on (1,1)(-1, 1) but does not converge uniformly on all of (1,1)(-1, 1). The partial sums Sn(x)=(1xn+1)/(1x)S_n(x) = (1-x^{n+1})/(1-x) satisfy

supx(1,1)Sn(x)1/(1x)=supx(1,1)xn+11x=\sup_{x \in (-1,1)} |S_n(x) - 1/(1-x)| = \sup_{x \in (-1,1)} \frac{|x|^{n+1}}{|1-x|} = \infty

for every nn (the supremum blows up as x1x \to 1^-). The uniform convergence theorem only guarantees uniformity on [r,r][-r, r] for r<1r < 1 — compact subsets strictly inside the interval of convergence. This is the same pointwise-vs-uniform distinction from Uniform Convergence, now appearing in a concrete power-series context.

Uniform convergence on compact subsets — sup-norm error decreasing on [-r,r] for r < R

6. Term-by-Term Differentiation & Integration

Because power series converge uniformly on compact subsets, the interchange theorems from Uniform Convergence apply. We can differentiate and integrate a power series term by term — and the resulting series has the same radius of convergence. This is the computational payoff of the theory.

🔷 Theorem 5 (Term-by-Term Differentiation)

If f(x)=n=0an(xc)nf(x) = \sum_{n=0}^{\infty} a_n (x-c)^n has radius of convergence R>0R > 0, then ff is differentiable on (cR,c+R)(c-R, c+R) and

f(x)=n=1nan(xc)n1.f'(x) = \sum_{n=1}^{\infty} n a_n (x-c)^{n-1}.

The differentiated series has the same radius of convergence RR.

Proof.

Fix x0(cR,c+R)x_0 \in (c-R, c+R) and choose rr with x0c<r<R|x_0 - c| < r < R. On [cr,c+r][c-r, c+r], the series converges uniformly (Theorem 4). The partial sums SN(x)=n=0Nan(xc)nS_N(x) = \sum_{n=0}^{N} a_n(x-c)^n are polynomials, hence differentiable. Each SN(x)=n=1Nnan(xc)n1S_N'(x) = \sum_{n=1}^{N} n a_n(x-c)^{n-1}.

The differentiated series nan(xc)n1\sum n a_n(x-c)^{n-1} has

lim supnnan1/n=lim supn(n1/nan1/n)=1lim supnan1/n=1R\limsup_{n \to \infty} |n a_n|^{1/n} = \limsup_{n \to \infty} \bigl(n^{1/n} |a_n|^{1/n}\bigr) = 1 \cdot \limsup_{n \to \infty} |a_n|^{1/n} = \frac{1}{R}

since n1/n1n^{1/n} \to 1. So the differentiated series has the same radius RR and converges uniformly on [cr,c+r][c-r, c+r].

By the interchange theorem for differentiation from Uniform Convergence, f(x0)=limNSN(x0)=n=1nan(x0c)n1f'(x_0) = \lim_{N \to \infty} S_N'(x_0) = \sum_{n=1}^{\infty} n a_n(x_0-c)^{n-1}.

🔷 Corollary 1 (Power series are C^∞)

Applying Theorem 5 repeatedly, f(k)(x)=n=kn!(nk)!an(xc)nkf^{(k)}(x) = \sum_{n=k}^{\infty} \frac{n!}{(n-k)!} a_n (x-c)^{n-k} for all k0k \geq 0, each with radius RR. A power series is infinitely differentiable inside its radius of convergence.

🔷 Theorem 6 (Term-by-Term Integration)

If f(x)=n=0an(xc)nf(x) = \sum_{n=0}^{\infty} a_n (x-c)^n has radius R>0R > 0, then

cxf(t)dt=n=0ann+1(xc)n+1\int_c^x f(t)\,dt = \sum_{n=0}^{\infty} \frac{a_n}{n+1}(x-c)^{n+1}

for xc<R|x - c| < R. The integrated series has radius of convergence RR.

📝 Example 7 (Deriving 1/(1−x)² by differentiation)

Differentiating 11x=n=0xn\frac{1}{1-x} = \sum_{n=0}^{\infty} x^n term by term gives

1(1x)2=n=1nxn1=n=0(n+1)xnfor x<1.\frac{1}{(1-x)^2} = \sum_{n=1}^{\infty} n x^{n-1} = \sum_{n=0}^{\infty} (n+1) x^n \quad \text{for } |x| < 1.

📝 Example 8 (Deriving ln(1+x) by integration)

Integrate 11+x=n=0(x)n=n=0(1)nxn\frac{1}{1+x} = \sum_{n=0}^{\infty} (-x)^n = \sum_{n=0}^{\infty} (-1)^n x^n term by term from 00 to xx:

ln(1+x)=0xdt1+t=n=0(1)nn+1xn+1=n=1(1)n+1nxnfor x<1.\ln(1+x) = \int_0^x \frac{dt}{1+t} = \sum_{n=0}^{\infty} \frac{(-1)^n}{n+1} x^{n+1} = \sum_{n=1}^{\infty} \frac{(-1)^{n+1}}{n} x^n \quad \text{for } |x| < 1.

The series also converges at x=1x = 1 by the alternating series test (Leibniz) from Series Convergence & Tests, giving ln2=11/2+1/31/4+\ln 2 = 1 - 1/2 + 1/3 - 1/4 + \cdots.

📝 Example 9 (The arctangent series)

Integrate 11+t2=n=0(1)nt2n\frac{1}{1+t^2} = \sum_{n=0}^{\infty} (-1)^n t^{2n} to get

arctan(x)=n=0(1)n2n+1x2n+1for x1.\arctan(x) = \sum_{n=0}^{\infty} \frac{(-1)^n}{2n+1} x^{2n+1} \quad \text{for } |x| \leq 1.

Setting x=1x = 1 gives the Leibniz formula π4=113+1517+\frac{\pi}{4} = 1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \cdots.

The explorer below lets you see term-by-term differentiation and integration in action. Toggle between the two modes and watch how the partial sums of the derived/integrated series track the true derivative/integral.

f(x)╤╤ S(x) f′(x)╤╤ Sₙ′(x)n = 6, R = 1

Term-by-term calculus — differentiation of 1/(1−x) and integration of 1/(1+x)

7. Taylor Series as Power Series — The Infinite Extension

A Taylor series is a power series whose coefficients are determined by the derivatives of a function. The question is: when does this particular power series converge to the function?

🔷 Theorem 7 (Coefficient Extraction (Uniqueness))

If f(x)=n=0an(xc)nf(x) = \sum_{n=0}^{\infty} a_n (x-c)^n on some interval (cR,c+R)(c-R, c+R) with R>0R > 0, then

an=f(n)(c)n!a_n = \frac{f^{(n)}(c)}{n!}

for all n0n \geq 0. A power series representation of a function is necessarily its Taylor series.

Proof.

By Corollary 1, f(k)(x)=n=kn!(nk)!an(xc)nkf^{(k)}(x) = \sum_{n=k}^{\infty} \frac{n!}{(n-k)!} a_n (x-c)^{n-k}. Setting x=cx = c, all terms with n>kn > k vanish (they contain a factor (cc)nk=0(c-c)^{n-k} = 0), leaving f(k)(c)=k!akf^{(k)}(c) = k! \, a_k. Solving gives ak=f(k)(c)/k!a_k = f^{(k)}(c)/k!.

💡 Remark 5 (Uniqueness has teeth)

If two power series anxn\sum a_n x^n and bnxn\sum b_n x^n are equal on any interval containing 00, then an=bna_n = b_n for all nn. You cannot have two different power series representations of the same function centered at the same point. This makes power series representations canonical.

📝 Example 10 (The Taylor series catalog)

The six essential Taylor series at c=0c = 0 (Maclaurin series):

1. ex=n=0xnn!e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!}, R=R = \infty

2. sinx=n=0(1)n(2n+1)!x2n+1\sin x = \sum_{n=0}^{\infty} \frac{(-1)^n}{(2n+1)!} x^{2n+1}, R=R = \infty

3. cosx=n=0(1)n(2n)!x2n\cos x = \sum_{n=0}^{\infty} \frac{(-1)^n}{(2n)!} x^{2n}, R=R = \infty

4. ln(1+x)=n=1(1)n+1nxn\ln(1+x) = \sum_{n=1}^{\infty} \frac{(-1)^{n+1}}{n} x^n, R=1R = 1, also converges at x=1x = 1

5. 11x=n=0xn\frac{1}{1-x} = \sum_{n=0}^{\infty} x^n, R=1R = 1

6. (1+x)α=n=0(αn)xn(1+x)^\alpha = \sum_{n=0}^{\infty} \binom{\alpha}{n} x^n (binomial series), R=1R = 1 for non-integer α\alpha

The flagship explorer below shows partial sums of these Taylor series converging to their target functions. Select a function, drag the nn slider, and watch Sn(x)S_n(x) approach f(x)f(x) inside the radius of convergence — and diverge wildly outside it.

f(x)╌╌ S5(x)R = 1Click the chart to probe a point

Taylor series catalog — six essential series with partial sums overlaid on target functions

8. Analytic vs. Smooth — When Taylor Series Succeed and Fail

Every power series with R>0R > 0 defines a CC^\infty function (Corollary 1). But does every CC^\infty function have a convergent Taylor series? The answer is no — and understanding why is the deepest insight of this topic.

📐 Definition 4 (Analytic Function)

A function ff is real analytic at cc if its Taylor series at cc converges to f(x)f(x) in some neighborhood of cc:

f(x)=n=0f(n)(c)n!(xc)nfor xc<R,  R>0.f(x) = \sum_{n=0}^{\infty} \frac{f^{(n)}(c)}{n!}(x-c)^n \quad \text{for } |x - c| < R, \; R > 0.

A function is analytic on an interval if it is analytic at every point of that interval. The class of analytic functions is denoted CωC^\omega.

🔷 Theorem 8 (Sufficient Condition for Analyticity)

If ff is CC^\infty on an interval II containing cc and there exist constants M>0M > 0 and K>0K > 0 such that

f(n)(x)MKnn!for all n and all xI,|f^{(n)}(x)| \leq M \cdot K^n \cdot n! \quad \text{for all } n \text{ and all } x \in I,

then ff is analytic at cc with R1/KR \geq 1/K.

Proof.

The Lagrange remainder from Mean Value Theorem & Taylor Expansion satisfies

Rn(x)f(n+1)(ξ)(n+1)!xcn+1MKn+1(n+1)!(n+1)!xcn+1=M(Kxc)n+1.|R_n(x)| \leq \frac{|f^{(n+1)}(\xi)|}{(n+1)!} |x-c|^{n+1} \leq \frac{M K^{n+1} (n+1)!}{(n+1)!} |x-c|^{n+1} = M (K|x-c|)^{n+1}.

For xc<1/K|x-c| < 1/K, this is a geometric sequence converging to 00, so Rn(x)0R_n(x) \to 0 and the Taylor series converges to ff.

📝 Example 11 (e^x, sin x, cos x are entire (analytic everywhere))

For exe^x: on any interval [B,B][-B, B], f(n)(x)=exeB|f^{(n)}(x)| = e^x \leq e^B. Setting M=eBM = e^B and K=1K = 1 gives the bound f(n)(x)eB1nn!|f^{(n)}(x)| \leq e^B \cdot 1^n \cdot n! — but we need MKnn!M K^n n!, and the actual derivatives satisfy the much tighter bound f(n)(x)eB|f^{(n)}(x)| \leq e^B (without the n!n! factor). This means Rn(x)eBxn+1/(n+1)!0R_n(x) \leq e^B |x|^{n+1}/(n+1)! \to 0 for every xx, so R=R = \infty.

The same argument applies to sinx\sin x and cosx\cos x, where all derivatives are bounded by 11.

📝 Example 12 (Smooth but not analytic — e^{−1/x²} revisited)

This extends the discussion from Mean Value Theorem & Taylor Expansion, Example 8. Define

f(x)={e1/x2x00x=0.f(x) = \begin{cases} e^{-1/x^2} & x \neq 0 \\ 0 & x = 0. \end{cases}

All derivatives at 00 are 00 (the proof by induction uses L’Hôpital’s Rule repeatedly: each f(n)(x)f^{(n)}(x) is a polynomial in 1/x1/x times e1/x2e^{-1/x^2}, and e1/x2e^{-1/x^2} decays faster than any polynomial as x0x \to 0). So the Taylor series at 00 is T(x)=0T(x) = 0. But f(x)>0f(x) > 0 for x0x \neq 0. The Taylor series converges — but to the wrong function.

The derivative bound f(n)(0)MKnn!|f^{(n)}(0)| \leq M K^n n! fails for any finite KK: the derivatives at points near 00 (but not at 00) grow faster than any geometric rate.

💡 Remark 6 (Analytic functions are rare but ubiquitous)

In a precise sense (Baire category), “most” smooth functions are not analytic. Yet in practice — in calculus, physics, and ML — almost every function we encounter is analytic (or piecewise analytic). The standard function zoo (exe^x, sin\sin, cos\cos, ln\ln, polynomials, rational functions, compositions thereof) is closed under the operations that preserve analyticity. The smooth-but-not-analytic examples are constructed to violate the derivative growth condition.

Analytic |f(0.9) − T5| = 8.45e-4Non-analytic |f(0.5) − T5| = 0.0183
f(x)╌╌ T(x) (analytic)╌╌ T(x) 0 (non-analytic)n = 5

Analytic vs. smooth — Taylor series converging to f (left) vs. converging to the wrong function (right)

9. Connections to ML — Taylor Expansions in Optimization and Inference

9.1 The descent lemma and gradient descent

If f\nabla f is LL-Lipschitz continuous, the second-order Taylor expansion with Lagrange remainder gives the descent lemma:

f(y)f(x)+f(x)T(yx)+L2yx2.f(y) \leq f(x) + \nabla f(x)^T(y - x) + \frac{L}{2}\|y - x\|^2.

Setting y=xηf(x)y = x - \eta \nabla f(x) with step size η=1/L\eta = 1/L yields

f(xk+1)f(xk)12Lf(xk)2f(x_{k+1}) \leq f(x_k) - \frac{1}{2L}\|\nabla f(x_k)\|^2

— the fundamental inequality guaranteeing that gradient descent makes progress at every step. Newton’s method goes further: it uses the full quadratic Taylor model T2(x)T_2(x) as its step direction, achieving quadratic convergence near optima. (→ formalML: Gradient Descent)

9.2 The Laplace approximation

Given a posterior p(θdata)e(θ)p(\theta \,|\, \text{data}) \propto e^{\ell(\theta)} where (θ)=logp(dataθ)+logp(θ)\ell(\theta) = \log p(\text{data} \,|\, \theta) + \log p(\theta), the second-order Taylor expansion of \ell at the MAP estimate θ^\hat{\theta} gives

(θ)(θ^)12(θθ^)TH(θθ^)\ell(\theta) \approx \ell(\hat{\theta}) - \frac{1}{2}(\theta - \hat{\theta})^T H (\theta - \hat{\theta})

where H=2(θ^)H = -\nabla^2 \ell(\hat{\theta}). This yields the Gaussian approximation p(θdata)N(θ^,H1)p(\theta \,|\, \text{data}) \approx \mathcal{N}(\hat{\theta}, H^{-1}) — replacing a complex posterior with a Gaussian centered at the mode. The quality of this approximation depends on how well the second-order Taylor expansion captures the log-posterior, which is exactly the analyticity question this topic addresses. (→ formalML: Information Geometry)

9.3 GELU and activation function approximation

The Gaussian Error Linear Unit GELU(x)=xΦ(x)\text{GELU}(x) = x \Phi(x), where Φ\Phi is the standard normal CDF, is one of the most widely used activation functions in modern transformers. Its practical implementation uses a polynomial approximation:

GELU(x)0.5x(1+tanh(2/π(x+0.044715x3)))\text{GELU}(x) \approx 0.5x\bigl(1 + \tanh(\sqrt{2/\pi}(x + 0.044715 x^3))\bigr)

which is derived from the Taylor expansion of the error function. The coefficient 0.0447150.044715 comes from matching Taylor series terms — a direct application of power series truncation in production neural network code.

9.4 Matrix exponential in continuous-time models

For neural ODEs, state-space models (S4, Mamba), and continuous-time dynamical systems, the matrix exponential

eAt=I+At+(At)22!+(At)33!+=n=0Antnn!e^{At} = I + At + \frac{(At)^2}{2!} + \frac{(At)^3}{3!} + \cdots = \sum_{n=0}^{\infty} \frac{A^n t^n}{n!}

is a power series in tt with matrix coefficients. It converges for all tt (since R=R = \infty for the scalar exponential, and the norm bound Antn/n!Antn/n!\|A^n t^n/n!\| \leq \|A\|^n |t|^n/n! gives a convergent comparison series). Term-by-term differentiation gives ddteAt=AeAt\frac{d}{dt} e^{At} = A e^{At}, which is the fundamental solution to the linear ODE x˙=Ax\dot{x} = Ax.

ML connections — descent lemma, Laplace approximation, GELU, and matrix exponential

10. Computational Notes

  • Power series evaluation: Horner’s method. Evaluating k=0nakxk\sum_{k=0}^{n} a_k x^k naively requires O(n2)O(n^2) multiplications. Horner’s method rewrites a0+a1x+a2x2++anxn=a0+x(a1+x(a2++xan))a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n = a_0 + x(a_1 + x(a_2 + \cdots + x \cdot a_n)) and evaluates from the inside out in O(n)O(n) with minimal round-off. Use numpy.polynomial.polynomial.polyval for numerical stability.
  • Radius of convergence estimation. For tabulated coefficients, compute an1/n|a_n|^{1/n} for large nn and look for convergence to 1/R1/R. The tail average of this sequence gives a reliable numerical estimate.
  • Endpoint testing is algorithmic. Compute RR via ratio or Cauchy-Hadamard, then substitute x=c±Rx = c \pm R into the series and apply the convergence-test flowchart from Series Convergence & Tests.

Computational verification — Horner's method and radius estimation from finite coefficients

11. Connections & Further Reading

Prerequisites used in this topic:

TopicWhat we used
Series Convergence & TestsRatio test, root test, comparison test, alternating series test for endpoint analysis and Cauchy-Hadamard formula
Mean Value Theorem & Taylor ExpansionTaylor polynomials, Lagrange remainder, smooth-but-not-analytic example
Uniform ConvergenceWeierstrass M-test, interchange theorems for differentiation and integration

Also connected:

TopicConnection
Riemann IntegralTerm-by-term integration produces antiderivatives as power series
Improper IntegralsEntire functions (R=R = \infty) and term-by-term integration of improper integrals

Downstream within formalCalculus:

Forward links to formalML:

  • Gradient Descent — descent lemma via second-order Taylor, Newton’s method via quadratic Taylor model
  • Convex Analysis — convexity characterized by Taylor remainder sign
  • Information Geometry — Fisher information via Taylor expansion of KL divergence
  • Smooth Manifolds — smooth (CC^\infty) vs. analytic (CωC^\omega) on manifolds

References

  1. book Rudin (1976). Principles of Mathematical Analysis Chapter 8 — the definitive treatment of power series, uniform convergence on compact subsets, and the algebra of power series
  2. book Abbott (2015). Understanding Analysis Chapter 6 — power series and Taylor series with an emphasis on the role of uniform convergence in justifying interchange
  3. book Spivak (2008). Calculus Chapter 23 — Taylor series with complete proofs of term-by-term theorems and the analytic vs. smooth distinction
  4. book Folland (1999). Real Analysis: Modern Techniques and Their Applications Section 0.6 — power series in the context of analysis prerequisites for measure theory
  5. book Bartle & Sherbert (2011). Introduction to Real Analysis Chapter 9 — power series convergence with careful attention to endpoint behavior and applications
  6. paper Hendrycks & Gimpel (2016). “Gaussian Error Linear Units (GELUs)” GELU(x) = x·Φ(x) is approximated via its Taylor expansion 0.5x(1 + tanh(√(2/π)(x + 0.044715x³))) — a direct application of power series truncation in neural network activation design.
  7. paper MacKay (1992). “A Practical Bayesian Framework for Backpropagation Networks” The Laplace approximation replaces the posterior with a Gaussian centered at the MAP estimate using a second-order Taylor expansion of the log-posterior — a foundational Bayesian ML technique.