Series & Approximation · foundational · 45 min read

Series Convergence & Tests

When do infinite sums make sense? The convergence tests that determine whether a series converges absolutely, conditionally, or not at all — and the learning rate conditions that make SGD work.

Abstract. An infinite series ∑ aₙ is defined as the limit of its partial sums Sₙ = a₁ + a₂ + ⋯ + aₙ — so series convergence is sequence convergence in disguise. The nth-term test provides a necessary condition (aₙ → 0), but the harmonic series ∑ 1/n shows it is not sufficient. The comparison test and limit comparison test relate unknown series to known benchmarks; the ratio test |aₙ₊₁/aₙ| → L and root test |aₙ|^(1/n) → L determine convergence when L < 1 and divergence when L > 1; the integral test connects ∑ f(n) to ∫ f(x)dx, bridging discrete sums and continuous integrals. For alternating series, the Leibniz test guarantees convergence when the terms decrease monotonically to zero. A series converges absolutely if ∑ |aₙ| converges, and absolute convergence implies convergence — but the converse fails: the alternating harmonic series ∑ (-1)ⁿ⁺¹/n converges conditionally but not absolutely. The Riemann rearrangement theorem reveals the fragility of conditional convergence: any conditionally convergent series can be rearranged to converge to any prescribed real number, or to diverge to ±∞. In machine learning, the Robbins-Monro conditions for SGD learning rates — ∑ αₜ = ∞ (the divergent series ensures the algorithm explores enough) and ∑ αₜ² < ∞ (the convergent series ensures the noise averages out) — are direct applications of series convergence theory. The p-series classification determines which polynomial-decay schedules αₜ = 1/tᵖ satisfy both conditions simultaneously: exactly those with p ∈ (1/2, 1]. The geometric series ∑ rⁿ = 1/(1-r) appears in discount factors for reinforcement learning, in the convergence analysis of momentum methods, and in the radius of convergence for power series that will be developed in the next topic.

Where this leads → formalML

  • formalML The Robbins-Monro conditions for SGD learning rates — ∑αₜ = ∞ and ∑αₜ² < ∞ — are series convergence conditions. The p-series classification ∑1/tᵖ determines which polynomial-decay schedules satisfy both constraints: exactly p ∈ (1/2, 1].
  • formalML The union bound converts uniform convergence over a hypothesis class into a series ∑P(bad event for h). Finite VC dimension ensures this series is controlled, yielding generalization bounds.
  • formalML Chernoff bounds produce geometric series in tail probabilities. The ratio test applied to moment-generating function expansions determines the strength of exponential concentration.
  • formalML The Borel-Cantelli lemma — if ∑P(Aₙ) < ∞ then P(limsup Aₙ) = 0 — is a direct application of series convergence to probability theory. Countable additivity of measures is defined via convergent series of set measures.

1. Overview & Motivation

You’re training a neural network with stochastic gradient descent. The learning rate schedule αt=1/tp\alpha_t = 1/t^p controls how aggressively the model updates at step tt. The schedule must satisfy two competing demands: t=1αt=\sum_{t=1}^\infty \alpha_t = \infty (the total step size must be infinite so the algorithm can reach any point in parameter space) and t=1αt2<\sum_{t=1}^\infty \alpha_t^2 < \infty (the total squared step size must be finite so the noise averages out). These are the Robbins-Monro conditions, introduced as forward references in Sequences, Limits & Convergence. The question is: which values of pp satisfy both conditions simultaneously? The answer — p(1/2,1]p \in (1/2, 1] — requires knowing when the pp-series 1/np\sum 1/n^p converges and when it diverges. That is what this topic is about.

But learning rate schedules are just one instance of the fundamental question: when does an infinite sum n=1an\sum_{n=1}^\infty a_n converge to a finite value? The answer occupies a central position in analysis and in machine learning. Loss series, gradient accumulation bounds, discount factors in reinforcement learning, tail probabilities in concentration inequalities, and Fourier coefficients — all are governed by the convergence theory we develop here.

The key insight: an infinite series is not a new concept. It is a sequence — the sequence of partial sums Sn=a1+a2++anS_n = a_1 + a_2 + \cdots + a_n. Every tool from Sequences, Limits & Convergence applies directly. We are not learning new machinery. We are applying existing machinery to a new and important class of sequences.

2. From Sequences to Series

An infinite series is a sequence in disguise. Every question about the convergence of an\sum a_n is really a question about the convergence of the sequence (Sn)(S_n) where Sn=k=1nakS_n = \sum_{k=1}^n a_k. This is not a metaphor — it is the definition.

📐 Definition 1 (Infinite Series)

Given a sequence (an)n=1(a_n)_{n=1}^\infty in R\mathbb{R}, the infinite series n=1an\sum_{n=1}^\infty a_n is the sequence of partial sums (Sn)n=1(S_n)_{n=1}^\infty defined by

Sn=k=1nak=a1+a2++an.S_n = \sum_{k=1}^n a_k = a_1 + a_2 + \cdots + a_n.

The terms ana_n are the terms of the series and SnS_n is the nnth partial sum.

📐 Definition 2 (Convergence of a Series)

The series n=1an\sum_{n=1}^\infty a_n converges if the sequence of partial sums (Sn)(S_n) converges — that is, if limnSn=S\lim_{n \to \infty} S_n = S exists as a finite real number. We write n=1an=S\sum_{n=1}^\infty a_n = S and call SS the sum of the series. If (Sn)(S_n) does not converge, the series diverges.

💡 Remark 1 (The reduction principle)

Definitions 1 and 2 convert every series problem into a sequence problem. The Monotone Convergence Theorem, the Cauchy criterion, and the Algebra of Limits — all from Sequences, Limits & Convergence — now apply to partial sums. In particular:

(a) If all an0a_n \geq 0, then (Sn)(S_n) is increasing, so an\sum a_n converges iff (Sn)(S_n) is bounded (Monotone Convergence).

(b) an\sum a_n converges iff for every ε>0\varepsilon > 0 there exists NN such that SmSn=k=n+1mak<ε|S_m - S_n| = \left|\sum_{k=n+1}^m a_k\right| < \varepsilon for all m>nNm > n \geq N (Cauchy criterion for series).

The first convergence test is an immediate consequence:

🔷 Theorem 1 (The Divergence Test (nth-Term Test))

If n=1an\sum_{n=1}^\infty a_n converges, then limnan=0\lim_{n \to \infty} a_n = 0.

Equivalently (contrapositive): if an↛0a_n \not\to 0, then an\sum a_n diverges.

Proof.

If the series converges with sum SS, then SnSS_n \to S and Sn1SS_{n-1} \to S. Since an=SnSn1a_n = S_n - S_{n-1}, the Algebra of Limits gives anSS=0a_n \to S - S = 0.

💡 Remark 2 (The divergence test is necessary but not sufficient)

The harmonic series 1/n\sum 1/n has an=1/n0a_n = 1/n \to 0, but it diverges (we prove this in the next section). Having an0a_n \to 0 is necessary for convergence, but far from sufficient. The convergence tests in the following sections provide stronger criteria.

The interactive explorer below makes the “series = sequence of partial sums” reduction tangible. Select a series preset and watch the partial sums (Sn)(S_n) converge (or not) as nn increases. For convergent series, drag the ε\varepsilon slider to see the ε\varepsilon-NN definition in action — the same definition from Sequences, Limits & Convergence, now applied to the partial-sum sequence.

Sₙ (partial sums) aₙ (terms)─ ─ S = 1.0000 N = 4 for ε = 0.100S_100 = 1.000000 · |error| = 0.000e+0

3. Fundamental Series — Geometric & pp-Series

Every convergence test works by comparing an unknown series to one of two benchmark families. These benchmarks are the reference points of series theory.

🔷 Theorem 2 (The Geometric Series)

For rRr \in \mathbb{R}:

n=0rn=11rif r<1,diverges if r1.\sum_{n=0}^\infty r^n = \frac{1}{1-r} \quad \text{if } |r| < 1, \qquad \text{diverges if } |r| \geq 1.

Proof.

The partial sum has the closed form Sn=k=0nrk=1rn+11rS_n = \sum_{k=0}^{n} r^k = \frac{1 - r^{n+1}}{1 - r} (verified by multiplying both sides by 1r1 - r and telescoping). If r<1|r| < 1, then rn+10r^{n+1} \to 0 (from Sequences, Limits & Convergence, the sequence rn0|r|^n \to 0 for r<1|r| < 1), so Sn11rS_n \to \frac{1}{1-r}. If r1|r| \geq 1, then rn↛0r^n \not\to 0, so SnS_n does not converge.

💡 Remark 3 (Why the geometric series matters)

The geometric series is the most important series in mathematics and in ML. In reinforcement learning, the discounted return Gt=k=0γkRt+k+1G_t = \sum_{k=0}^\infty \gamma^k R_{t+k+1} is a geometric series weighted by rewards — convergence requires γ<1|\gamma| < 1, which is why discount factors satisfy γ[0,1)\gamma \in [0, 1). In momentum-based optimization (e.g., Adam), the bias correction factor 1/(1βt)1/(1β)=k=0βk1/(1 - \beta^t) \to 1/(1 - \beta) = \sum_{k=0}^\infty \beta^k is the geometric series sum. In the ratio test (Theorem 6), convergence is established by showing the terms eventually decay faster than a geometric series.

📝 Example 1 (Geometric series computations)

(a) n=0(1/2)n=1/(11/2)=2\sum_{n=0}^\infty (1/2)^n = 1/(1-1/2) = 2.

(b) n=0(1/3)n=1/(1+1/3)=3/4\sum_{n=0}^\infty (-1/3)^n = 1/(1+1/3) = 3/4.

(c) n=13(0.9)n=30.910.9=27\sum_{n=1}^\infty 3 \cdot (0.9)^n = 3 \cdot \frac{0.9}{1-0.9} = 27.

📐 Definition 3 (The p-Series)

For p>0p > 0, the pp-series is n=11np\sum_{n=1}^\infty \frac{1}{n^p}.

🔷 Theorem 3 (p-Series Convergence)

n=11np\sum_{n=1}^\infty \frac{1}{n^p} converges if and only if p>1p > 1.

Proof.

We use the Cauchy condensation test. Consider the “condensed” series k=02ka2k=k=02k12kp=k=02k(1p)\sum_{k=0}^\infty 2^k \cdot a_{2^k} = \sum_{k=0}^\infty 2^k \cdot \frac{1}{2^{kp}} = \sum_{k=0}^\infty 2^{k(1-p)}. This is a geometric series with ratio 21p2^{1-p}, which converges iff 21p<12^{1-p} < 1, i.e., 1p<01 - p < 0, i.e., p>1p > 1.

The Cauchy condensation theorem states: for a positive decreasing sequence (an)(a_n), an\sum a_n converges iff 2ka2k\sum 2^k a_{2^k} converges. The key idea is that the block of terms a2k+1,a2k+2,,a2k+1a_{2^k+1}, a_{2^k+2}, \ldots, a_{2^{k+1}} contains 2k2^k terms, each between a2k+1a_{2^{k+1}} and a2ka_{2^k} (since the sequence is decreasing). So the block sum is bounded between 2ka2k+12^k a_{2^{k+1}} and 2ka2k2^k a_{2^k}, and the condensed series captures the growth rate exactly.

📝 Example 2 (The harmonic series diverges)

Setting p=1p = 1: 1/n=\sum 1/n = \infty.

Alternative proof (Oresme’s grouping):

1+12+(13+14)+(15+16+17+18)+1+12+12+12+=.1 + \frac{1}{2} + \left(\frac{1}{3} + \frac{1}{4}\right) + \left(\frac{1}{5} + \frac{1}{6} + \frac{1}{7} + \frac{1}{8}\right) + \cdots \geq 1 + \frac{1}{2} + \frac{1}{2} + \frac{1}{2} + \cdots = \infty.

Each group of 2k2^k terms sums to at least 1/21/2, so the partial sums grow without bound. This is perhaps the most important divergence result in analysis — it says that even though 1/n01/n \to 0, the terms don’t shrink fast enough to produce a finite sum.

📝 Example 3 (p-series classification)

  • 1/n2=π2/6\sum 1/n^2 = \pi^2/6 (Basel problem, converges — p=2>1p = 2 > 1).
  • 1/n3/2\sum 1/n^{3/2} converges (p=3/2>1p = 3/2 > 1).
  • 1/n\sum 1/\sqrt{n} diverges (p=1/21p = 1/2 \leq 1).

Geometric and p-series convergence boundaries

4. Comparison Tests

The comparison tests let us determine convergence by bounding an unknown series against a known benchmark.

🔷 Theorem 4 (The Comparison Test (Direct))

Let 0anbn0 \leq a_n \leq b_n for all nN0n \geq N_0. Then:

  1. If bn\sum b_n converges, then an\sum a_n converges.
  2. If an\sum a_n diverges, then bn\sum b_n diverges.

Proof.

(1) The partial sums Sn(a)=k=1nakS_n^{(a)} = \sum_{k=1}^n a_k are increasing (since ak0a_k \geq 0) and bounded above by SN0(a)+k=N0+1bk<S_{N_0}^{(a)} + \sum_{k=N_0+1}^\infty b_k < \infty. By the Monotone Convergence Theorem (Sequences, Limits & Convergence, Theorem 1), Sn(a)S_n^{(a)} converges.

(2) Contrapositive of (1).

📝 Example 4 (Using the comparison test)

(a) 1n2+n\sum \frac{1}{n^2 + n} converges: 1n2+n<1n2\frac{1}{n^2+n} < \frac{1}{n^2} and 1/n2\sum 1/n^2 converges.

(b) lnnn\sum \frac{\ln n}{n} diverges: lnnn1n\frac{\ln n}{n} \geq \frac{1}{n} for n3n \geq 3 and 1/n\sum 1/n diverges.

🔷 Theorem 5 (The Limit Comparison Test)

Let an>0a_n > 0 and bn>0b_n > 0 for all nn. If limnanbn=L\lim_{n \to \infty} \frac{a_n}{b_n} = L where 0<L<0 < L < \infty, then an\sum a_n and bn\sum b_n either both converge or both diverge.

Proof.

Since an/bnL>0a_n/b_n \to L > 0, for ε=L/2\varepsilon = L/2 there exists NN such that nNL/2<an/bn<3L/2n \geq N \Rightarrow L/2 < a_n/b_n < 3L/2. Thus (L/2)bn<an<(3L/2)bn(L/2)b_n < a_n < (3L/2)b_n for all nNn \geq N. If bn\sum b_n converges, the direct comparison gives nNan(3L/2)nNbn<\sum_{n \geq N} a_n \leq (3L/2)\sum_{n \geq N} b_n < \infty, so an\sum a_n converges. If bn\sum b_n diverges, then nNan(L/2)nNbn=\sum_{n \geq N} a_n \geq (L/2)\sum_{n \geq N} b_n = \infty.

📝 Example 5 (Limit comparison)

nn3+1\sum \frac{n}{n^3 + 1} converges: compare with bn=1/n2b_n = 1/n^2, since n/(n3+1)1/n2=n3n3+11\frac{n/(n^3+1)}{1/n^2} = \frac{n^3}{n^3+1} \to 1, and 1/n2\sum 1/n^2 converges.

💡 Remark 4 (The comparison hierarchy)

The direct comparison test requires an explicit inequality anbna_n \leq b_n or anbna_n \geq b_n, which can be algebraically demanding. The limit comparison test requires only that anLbna_n \sim L \cdot b_n asymptotically, which is usually easier to verify. In practice, the limit comparison test with pp-series benchmarks handles most polynomial-type series.

Comparison test visualizations

5. Ratio & Root Tests

The comparison tests require an external benchmark series. The ratio and root tests are self-contained: they use the series’s own terms to determine convergence by detecting whether the terms eventually decay faster than a geometric series.

🔷 Theorem 6 (The Ratio Test (d'Alembert))

Let (an)(a_n) be a sequence with an0a_n \neq 0 for all nn. Define L=limnan+1anL = \lim_{n \to \infty} \left|\frac{a_{n+1}}{a_n}\right| (if the limit exists, or use lim sup\limsup).

  1. If L<1L < 1, the series an\sum a_n converges absolutely.
  2. If L>1L > 1 (or L=L = \infty), the series diverges.
  3. If L=1L = 1, the test is inconclusive.

Proof.

(1) If L<1L < 1, choose rr with L<r<1L < r < 1. By definition of the limit, there exists NN such that an+1/an<r|a_{n+1}/a_n| < r for all nNn \geq N. Then an<aNrnN|a_n| < |a_N| \cdot r^{n-N} for n>Nn > N (by induction). Since rnN\sum r^{n-N} is a convergent geometric series (r<1r < 1), the comparison test gives an<\sum |a_n| < \infty.

(2) If L>1L > 1, eventually an+1>an|a_{n+1}| > |a_n|, so an|a_n| is eventually increasing and an↛0a_n \not\to 0. The series diverges by the divergence test.

📝 Example 6 (Ratio test applications)

(a) n!nn\sum \frac{n!}{n^n}: an+1an=(n+1)!/(n+1)n+1n!/nn=nn(n+1)n=(nn+1)n1/e<1\frac{a_{n+1}}{a_n} = \frac{(n+1)!/(n+1)^{n+1}}{n!/n^n} = \frac{n^n}{(n+1)^n} = \left(\frac{n}{n+1}\right)^n \to 1/e < 1. Converges.

(b) 2nn!\sum \frac{2^n}{n!}: an+1an=2n+10<1\frac{a_{n+1}}{a_n} = \frac{2}{n+1} \to 0 < 1. Converges.

(c) 1n\sum \frac{1}{n}: an+1an=nn+11\frac{a_{n+1}}{a_n} = \frac{n}{n+1} \to 1. Inconclusive (we know it diverges by other means).

🔷 Theorem 7 (The Root Test (Cauchy))

Let L=lim supnan1/nL = \limsup_{n \to \infty} |a_n|^{1/n}.

  1. If L<1L < 1, the series an\sum a_n converges absolutely.
  2. If L>1L > 1, the series diverges.
  3. If L=1L = 1, the test is inconclusive.

Proof.

(1) If L<1L < 1, choose rr with L<r<1L < r < 1. By definition of lim sup\limsup, there exists NN such that an1/n<r|a_n|^{1/n} < r for all nNn \geq N (this is where lim sup\limsup is needed rather than lim\lim — there may be finitely many exceptions). Then an<rn|a_n| < r^n for all nNn \geq N, and rn<\sum r^n < \infty (convergent geometric series).

(2) If L>1L > 1, then an1/n>1|a_n|^{1/n} > 1 for infinitely many nn, so an>1|a_n| > 1 for infinitely many nn, hence an↛0a_n \not\to 0.

📝 Example 7 (Root test applications)

(a) (n2n+1)n\sum \left(\frac{n}{2n+1}\right)^n: an1/n=n2n+11/2<1|a_n|^{1/n} = \frac{n}{2n+1} \to 1/2 < 1. Converges.

(b) (11+1/n)n2\sum \left(\frac{1}{1 + 1/n}\right)^{n^2}: an1/n=(nn+1)n1/e<1|a_n|^{1/n} = \left(\frac{n}{n+1}\right)^n \to 1/e < 1. Converges.

💡 Remark 5 (Ratio vs. root — root is strictly stronger)

The root test is strictly stronger than the ratio test: whenever the ratio test gives a verdict, the root test gives the same verdict, but there exist series where the root test succeeds and the ratio test is inconclusive. This is because lim infan+1/anlim infan1/nlim supan1/nlim supan+1/an\liminf |a_{n+1}/a_n| \leq \liminf |a_n|^{1/n} \leq \limsup |a_n|^{1/n} \leq \limsup |a_{n+1}/a_n|. In practice, the ratio test is more convenient for factorials and exponentials, while the root test is better for nnth powers.

The convergence test dashboard below applies multiple tests to the same series simultaneously, showing which tests are conclusive and which are not. This makes the relative power of the tests visible — the root test never fails when the ratio test succeeds, but sometimes succeeds when the ratio test gives up.

Series: Geometric (r = 1/2)Type: absoluteSum = 1.0000

Ratio and root test diagnostics

6. The Integral Test

The integral test connects series convergence to improper integral convergence — linking the discrete world of f(n)\sum f(n) to the continuous world of f(x)dx\int f(x)\,dx from Improper Integrals & Special Functions.

The geometric idea is clean. If ff is positive, continuous, and decreasing on [1,)[1, \infty), the rectangles of height f(n)f(n) and width 1 overestimate or underestimate the area under ff. Specifically:

1n+1f(x)dxk=1nf(k)f(1)+1nf(x)dx.\int_1^{n+1} f(x)\,dx \leq \sum_{k=1}^n f(k) \leq f(1) + \int_1^n f(x)\,dx.

So the partial sums and the integrals are bounded by each other — they converge or diverge together.

🔷 Theorem 8 (The Integral Test)

Let f:[1,)Rf: [1, \infty) \to \mathbb{R} be positive, continuous, and decreasing. Then n=1f(n)\sum_{n=1}^\infty f(n) converges if and only if 1f(x)dx\int_1^\infty f(x)\,dx converges.

Proof.

Since ff is decreasing, for all k1k \geq 1: f(k+1)kk+1f(x)dxf(k)f(k+1) \leq \int_k^{k+1} f(x)\,dx \leq f(k). Summing from k=1k = 1 to nn:

k=2n+1f(k)1n+1f(x)dxk=1nf(k).\sum_{k=2}^{n+1} f(k) \leq \int_1^{n+1} f(x)\,dx \leq \sum_{k=1}^n f(k).

If 1f(x)dx\int_1^\infty f(x)\,dx converges, the right inequality gives Snf(1)+1f(x)dx<S_n \leq f(1) + \int_1^\infty f(x)\,dx < \infty, so (Sn)(S_n) is bounded and increasing, hence convergent by the Monotone Convergence Theorem. If 1f(x)dx=\int_1^\infty f(x)\,dx = \infty, the left inequality gives Sn+1f(1)1n+1f(x)dxS_{n+1} - f(1) \geq \int_1^{n+1} f(x)\,dx \to \infty, so the partial sums diverge.

📝 Example 8 (The integral test recovers p-series)

For f(x)=1/xpf(x) = 1/x^p: 1xpdx=x1p1p1\int_1^\infty x^{-p}\,dx = \frac{x^{1-p}}{1-p}\big|_1^\infty, which converges iff 1p<01-p < 0, i.e., p>1p > 1. This recovers Theorem 3 via the integral test.

📝 Example 9 (Series not amenable to other tests)

1n(lnn)2\sum \frac{1}{n(\ln n)^2} for n2n \geq 2: the ratio and root tests both give L=1L = 1 (inconclusive). The integral test with f(x)=1/(x(lnx)2)f(x) = 1/(x(\ln x)^2): substituting u=lnxu = \ln x,

2dxx(lnx)2=ln2u2du=1ln2<.\int_2^\infty \frac{dx}{x(\ln x)^2} = \int_{\ln 2}^\infty u^{-2}\,du = \frac{1}{\ln 2} < \infty.

Converges. This is a series that only the integral test can handle among our toolkit.

💡 Remark 6 (Integral test remainder bounds)

The integral test also provides error bounds: if Rn=k=n+1f(k)R_n = \sum_{k=n+1}^\infty f(k) is the remainder after nn terms, then

n+1f(x)dxRnnf(x)dx.\int_{n+1}^\infty f(x)\,dx \leq R_n \leq \int_n^\infty f(x)\,dx.

This tells you how many terms you need for a given accuracy — exactly the kind of bound used in numerical analysis and in bounding truncation error in series approximations for ML (e.g., truncating a Taylor expansion or a Fourier series).

Integral test visualization

7. Alternating Series & the Leibniz Test

In all the tests so far, we have mostly dealt with series of positive terms. Alternating series — where the signs alternate — introduce a fundamentally new phenomenon: cancellation between positive and negative terms can produce convergence even when the series of absolute values diverges.

📐 Definition 4 (Alternating Series)

A series n=1(1)n+1bn\sum_{n=1}^\infty (-1)^{n+1} b_n where bn>0b_n > 0 for all nn is an alternating series.

🔷 Theorem 9 (The Alternating Series Test (Leibniz Test))

If (bn)(b_n) is decreasing (bn+1bnb_{n+1} \leq b_n) and limnbn=0\lim_{n \to \infty} b_n = 0, then the alternating series n=1(1)n+1bn\sum_{n=1}^\infty (-1)^{n+1} b_n converges.

Proof.

Consider the even partial sums:

S2n=(b1b2)+(b3b4)++(b2n1b2n).S_{2n} = (b_1 - b_2) + (b_3 - b_4) + \cdots + (b_{2n-1} - b_{2n}).

Each parenthesized pair is 0\geq 0 (since bkb_k is decreasing), so (S2n)(S_{2n}) is increasing. Also,

S2n=b1(b2b3)(b2n2b2n1)b2nb1,S_{2n} = b_1 - (b_2 - b_3) - \cdots - (b_{2n-2} - b_{2n-1}) - b_{2n} \leq b_1,

so (S2n)(S_{2n}) is bounded above by b1b_1.

By the Monotone Convergence Theorem, S2nSS_{2n} \to S for some Sb1S \leq b_1. Now S2n+1=S2n+b2n+1S+0=SS_{2n+1} = S_{2n} + b_{2n+1} \to S + 0 = S. Since both the even and odd subsequences converge to SS, the full sequence SnSS_n \to S.

📝 Example 10 (The alternating harmonic series)

n=1(1)n+1n=112+1314+=ln2\sum_{n=1}^\infty \frac{(-1)^{n+1}}{n} = 1 - \frac{1}{2} + \frac{1}{3} - \frac{1}{4} + \cdots = \ln 2. The terms bn=1/nb_n = 1/n are decreasing and tend to 00, so the Leibniz test applies. The sum is ln2\ln 2 (provable via the Taylor series of ln(1+x)\ln(1+x) at x=1x=1, which we establish in Power Series & Taylor Series).

Note: 1/n=\sum 1/n = \infty diverges, so this series converges conditionally but not absolutely. We formalize this distinction in the next section.

💡 Remark 7 (Alternating series remainder)

If the alternating series converges to SS, the error after nn terms satisfies SSnbn+1|S - S_n| \leq b_{n+1} — the error is bounded by the first omitted term. This is a remarkably tight bound, much better than what most convergence tests provide.

Alternating series behavior

8. Absolute vs. Conditional Convergence

Absolute convergence is the “safe” mode of convergence — it is robust under rearrangement and implies all forms of convergence. Conditional convergence is fragile: rearranging the terms can change the sum or destroy convergence entirely.

📐 Definition 5 (Absolute Convergence)

A series an\sum a_n converges absolutely if an\sum |a_n| converges.

📐 Definition 6 (Conditional Convergence)

A series an\sum a_n converges conditionally if it converges but does not converge absolutely — that is, an\sum a_n converges but an=\sum |a_n| = \infty.

🔷 Theorem 10 (Absolute Convergence Implies Convergence)

If an\sum |a_n| converges, then an\sum a_n converges, and anan\left|\sum a_n\right| \leq \sum |a_n|.

Proof.

We use the Cauchy criterion. Since an\sum |a_n| converges, for any ε>0\varepsilon > 0 there exists NN such that k=n+1mak<ε\sum_{k=n+1}^m |a_k| < \varepsilon for all m>nNm > n \geq N. By the triangle inequality:

k=n+1makk=n+1mak<ε.\left|\sum_{k=n+1}^m a_k\right| \leq \sum_{k=n+1}^m |a_k| < \varepsilon.

This is the Cauchy criterion for an\sum a_n, so an\sum a_n converges.

The inequality anan|\sum a_n| \leq \sum |a_n| follows by taking limits of SnTn|S_n| \leq T_n where Tn=k=1nakT_n = \sum_{k=1}^n |a_k|.

Note: This proof uses the completeness of R\mathbb{R} — the Cauchy criterion requires completeness. In an incomplete space (like Q\mathbb{Q}), absolute convergence need not imply convergence. See Completeness & Compactness for why completeness matters.

📝 Example 11 (Classification)

(a) (1)n+1/n2\sum (-1)^{n+1}/n^2 converges absolutely (1/n2<\sum 1/n^2 < \infty).

(b) (1)n+1/n\sum (-1)^{n+1}/n converges conditionally (1/n=\sum 1/n = \infty, but the Leibniz test gives convergence).

(c) (1)n\sum (-1)^n diverges (not even conditional convergence — the nnth-term test kills it).

🔷 Theorem 11 (The Riemann Rearrangement Theorem)

Let an\sum a_n be a conditionally convergent series. Then for any SR{,+}S \in \mathbb{R} \cup \{-\infty, +\infty\}, there exists a rearrangement aσ(n)\sum a_{\sigma(n)} (where σ:NN\sigma: \mathbb{N} \to \mathbb{N} is a bijection) that converges to SS.

Proof.

Let pkp_k be the positive terms (in their original order) and qkq_k the negative terms. Since an\sum a_n is conditionally convergent, both pk=+\sum p_k = +\infty and qk=+\sum |q_k| = +\infty — if either were finite, the series would converge absolutely.

To reach a target SS: add positive terms p1,p2,p_1, p_2, \ldots until the partial sum first exceeds SS; then add negative terms q1,q2,q_1, q_2, \ldots until it drops below SS; then add more positive terms until it exceeds SS again; continue alternating.

The key insight: because pk0p_k \to 0 and qk0q_k \to 0 (since an0a_n \to 0 by the divergence test applied to the original convergent series), each overshoot/undershoot gets smaller. The partial sums oscillate around SS with decreasing amplitude, converging to SS by the squeeze theorem.

💡 Remark 8 (Rearrangement and absolute convergence)

If an\sum a_n converges absolutely, then every rearrangement converges to the same sum. This is why absolute convergence is the “safe” mode: it is invariant under permutation. Conditional convergence is inherently order-dependent.

In computational settings, finite-precision arithmetic implicitly rearranges series (due to rounding and the order of evaluation), so absolute convergence guarantees reproducibility, whereas conditional convergence does not.

📝 Example 12 (A rearrangement of the alternating harmonic series)

The standard ordering gives (1)n+1/n=ln20.693\sum (-1)^{n+1}/n = \ln 2 \approx 0.693.

The rearrangement 1+1312+15+1714+1 + \frac{1}{3} - \frac{1}{2} + \frac{1}{5} + \frac{1}{7} - \frac{1}{4} + \cdots (two positive terms, then one negative term) converges to 32ln21.040\frac{3}{2}\ln 2 \approx 1.040.

The explorer below shows absolute vs. conditional convergence side by side, and lets you construct Riemann rearrangements that converge to different target values — demonstrating the theorem in real time.

Original S = 0.690653 Absolute Σ|aₙ| = 5.8780 → ∞ Rearranged S = 1.043463Conditional convergence

Riemann rearrangement

9. A Test Selection Guide

With seven convergence tests in hand, we need a decision framework.

💡 Remark 9 (Test selection strategy)

  1. Always check the divergence test first. If an↛0a_n \not\to 0, the series diverges — done.
  2. Recognize geometric series (an=crna_n = cr^n) and pp-series (an=1/npa_n = 1/n^p) immediately. These are the benchmarks.
  3. Factorials or nnth powers of constants: ratio test.
  4. nnth powers of nn-dependent expressions: root test.
  5. Series resembling 1/np1/n^p asymptotically: limit comparison with the pp-series.
  6. f(n)f(n) where ff has an elementary antiderivative: integral test.
  7. Alternating series: Leibniz test.
  8. If you need absolute convergence: apply tests 3–6 to an\sum |a_n|.

Test selection flowchart

10. Computational Notes

In practice, we compute partial sums with NumPy and verify convergence tests numerically. The following patterns appear constantly in ML and scientific computing.

Computing partial sums. Given a term function a(n), the partial sums are np.cumsum([a(n) for n in range(1, N+1)]). For the geometric series with r=0.5r = 0.5 and N=100N = 100, this gives a value within 103010^{-30} of the exact sum 22.

Numerical ratio test. Compute |a(n+1)/a(n)| for nn up to 10001000 and plot the sequence. If it stabilizes below 11, the series converges; above 11, it diverges; near 11, look elsewhere.

Numerical root test. Compute |a(n)|^(1/n) and plot. This is more numerically stable for series with nnth powers, since taking the nnth root normalizes the growth rate.

Integral test with scipy.integrate.quad. For f(x)=1/(xln2x)f(x) = 1/(x \ln^2 x), quad(f, 2, np.inf) returns (1/ln2,error)(1/\ln 2, \text{error}), confirming convergence.

Riemann rearrangement. Alternating between positive and negative terms of the harmonic series, targeting a value SS: the algorithm described in Theorem 11’s proof is directly implementable and converges to any target SS within 1/n1/n after using nn terms.

11. Connections to ML

Series convergence appears in ML through four distinct paths.

11.1 Learning Rate Schedules & the Robbins-Monro Conditions

The Robbins-Monro conditions for SGD: t=1αt=\sum_{t=1}^\infty \alpha_t = \infty and t=1αt2<\sum_{t=1}^\infty \alpha_t^2 < \infty. The first condition ensures the algorithm can reach any point (the total step size is unbounded). The second ensures that the noise variance αt2σ2\sum \alpha_t^2 \sigma^2 remains finite (the iterates don’t oscillate forever).

For polynomial schedules αt=c/tp\alpha_t = c/t^p: 1/tp\sum 1/t^p converges iff p>1p > 1 (Theorem 3), and 1/t2p\sum 1/t^{2p} converges iff 2p>12p > 1, i.e., p>1/2p > 1/2. Both conditions are satisfied iff p(1/2,1]p \in (1/2, 1]. The canonical choice αt=c/t\alpha_t = c/t (p=1p = 1) satisfies both conditions exactly at the boundary.

For exponential schedules αt=α0γt\alpha_t = \alpha_0 \gamma^t with γ<1\gamma < 1: γt=1/(1γ)<\sum \gamma^t = 1/(1-\gamma) < \infty (geometric series), so condition 1 fails — the algorithm stops exploring too soon. This is why exponential decay schedules require warmup or restarts in practice.

-> Gradient Descent -> formalML

11.2 Gradient Accumulation & Momentum

In momentum-based optimizers (SGD with momentum, Adam), the update at step tt is a weighted sum of all past gradients: vt=k=0t1βkgtkv_t = \sum_{k=0}^{t-1} \beta^k g_{t-k}. The total weight is the partial sum of a geometric series: k=0t1βk=1βt1β\sum_{k=0}^{t-1} \beta^k = \frac{1 - \beta^t}{1 - \beta}. As tt \to \infty, this approaches 1/(1β)1/(1 - \beta). Adam’s bias correction divides by 1βt1 - \beta^t precisely to account for the finite partial sum being less than the infinite series sum.

11.3 Discount Factors in Reinforcement Learning

The discounted return Gt=k=0γkRt+k+1G_t = \sum_{k=0}^\infty \gamma^k R_{t+k+1} is a geometric series in the discount factor γ[0,1)\gamma \in [0, 1). The series converges because γ<1\gamma < 1, and the total weight is 1/(1γ)1/(1 - \gamma). If rewards are bounded by RmaxR_{\max}, then GtRmax/(1γ)|G_t| \leq R_{\max}/(1-\gamma).

11.4 The Borel-Cantelli Lemma in Online Learning

The first Borel-Cantelli lemma: if n=1P(An)<\sum_{n=1}^\infty P(A_n) < \infty, then P(lim supAn)=0P(\limsup A_n) = 0 — only finitely many of the events AnA_n occur almost surely. In online learning, if AnA_n is the event “the algorithm makes an error at step nn,” and the error probabilities P(An)P(A_n) decrease fast enough that their series converges, then the total number of errors is finite almost surely. The convergence test used to verify P(An)<\sum P(A_n) < \infty is typically comparison with a pp-series.

-> Measure-Theoretic Probability -> formalML

-> PAC Learning -> formalML

αₜ = 1/t^1.00 Σαₜ = ∞ Σαₜ² <Robbins-Monro: VALIDValid range: p ∈ (1/2, 1]

ML connections

12. Connections & Further Reading

Prerequisites (live links):

  • Sequences, Limits & Convergence — partial sums are sequences; every convergence test reduces to sequence convergence.
  • Uniform Convergence — the Weierstrass M-test is a series-convergence test; this topic provides the numerical tools to verify that Mk<\sum M_k < \infty.

Cross-track connections (live links):

Forward references (planned topics):

  • Power Series & Taylor Series — the ratio and root tests determine the radius of convergence; absolute convergence on the interior of the interval.
  • Fourier Series & Orthogonal Expansions — convergence of Fourier coefficients requires summability methods that generalize the convergence tests here.
  • Approximation Theory — Stone-Weierstrass and uniform convergence of approximating series.
  • Sigma-Algebras & Measures — countable additivity of measures is defined via convergent series; the Borel-Cantelli lemma is a series convergence test for probability.

References

  1. book Rudin (1976). Principles of Mathematical Analysis Chapter 3 — the definitive treatment of numerical series, convergence tests, and absolute vs. conditional convergence
  2. book Abbott (2015). Understanding Analysis Chapter 2 — an accessible treatment emphasizing the sequence-to-series reduction and the role of completeness
  3. book Folland (1999). Real Analysis: Modern Techniques and Their Applications Section 0.5 — series convergence in the context of measure theory prerequisites
  4. book Spivak (2008). Calculus Chapter 22 — series of real numbers with historically motivated exposition and complete proofs of all standard tests
  5. book Bartle & Sherbert (2011). Introduction to Real Analysis Chapter 3.7 and Chapter 9 — series convergence tests with careful comparison to improper integrals
  6. paper Robbins & Monro (1951). “A Stochastic Approximation Method” The original conditions ∑αₙ = ∞, ∑αₙ² < ∞ for stochastic approximation convergence — the foundational ML application of series convergence
  7. paper Kingma & Ba (2015). “Adam: A Method for Stochastic Optimization” Bias correction in Adam uses geometric series ∑βᵗ = 1/(1-β). Learning rate scheduling analysis requires the series tools from this topic.