ODEs · foundational · 50 min read

First-Order ODEs & Existence Theorems

Equations where the unknown is a function — direction fields, separation of variables, integrating factors, and the Picard-Lindelöf theorem that guarantees solutions exist and are unique when the right-hand side is Lipschitz.

Abstract. A first-order ordinary differential equation y' = f(t, y) asks: which function y(t) has the property that its derivative at every point equals f(t, y(t))? Direction fields visualize the answer geometrically — at every point (t, y), draw a short line segment with slope f(t, y), and the solution curves are the paths that follow these slopes. For separable equations y' = g(t)h(y), the variables can be separated and integrated independently. For linear equations y' + p(t)y = q(t), the integrating factor μ(t) = exp(∫p(t)dt) converts the left side into an exact derivative. For exact equations M dt + N dy = 0 with M_y = N_t, the solution is a level curve of a potential function — the same structure as conservative vector fields. The Picard-Lindelöf theorem guarantees that if f is continuous and Lipschitz in y, then the initial value problem y' = f(t, y), y(t₀) = y₀ has a unique local solution. The proof is constructive: the Picard iterates y_{n+1}(t) = y₀ + ∫f(s, yₙ(s))ds form a contraction mapping in the space of continuous functions, converging to the unique solution — the same fixed-point technique that proved the Inverse Function Theorem. When the Lipschitz condition fails, as in y' = y^{2/3}, existence (Peano) still holds, but uniqueness fails — solutions can branch. In machine learning, gradient descent is a discretized ODE: the continuous gradient flow θ̇ = −∇L(θ) is a first-order ODE whose solutions are guaranteed by Picard-Lindelöf when the loss landscape is smooth. Neural ODEs (Chen et al. 2018) replace discrete layers with continuous dynamics, using ODE solvers for the forward pass and the adjoint method for backpropagation.

Where this leads → formalML

  • formalML Gradient descent is the Euler discretization of gradient flow θ̇ = −∇L(θ), a first-order ODE. The Picard-Lindelöf theorem guarantees the existence and uniqueness of the gradient flow trajectory when ∇L is Lipschitz, the smoothness assumption underlying most convergence proofs. The learning rate η is the Euler step size, and a smaller η yields a better approximation of the continuous trajectory.
  • formalML A vector field X on a smooth manifold M defines a first-order ODE: the integral curves of X are the solutions. The existence and uniqueness theorem guarantees local integral curves exist, and the flow map Φ_t: M → M is a local diffeomorphism. This is the ODE-theoretic foundation of dynamical systems on manifolds.
  • formalML Stochastic differential equations dX_t = f(X_t)dt + σ(X_t)dW_t extend deterministic ODEs by adding Brownian noise. Itô's existence theorem is the stochastic analog of Picard-Lindelöf, using the same Lipschitz condition and contraction argument in a space of stochastic processes.

1. Direction Fields — The Geometry of an ODE

Until now, every equation we have solved has asked for a number: find xx such that f(x)=0f(x) = 0, or find the value abf(x)dx\int_a^b f(x)\,dx. An ordinary differential equation asks for something fundamentally different — a function. The equation y(t)=f(t,y(t))y'(t) = f(t, y(t)) asks: which function y(t)y(t) has the property that its derivative at every point (t,y)(t, y) equals f(t,y)f(t, y)?

Before any algebra, we draw. A first-order ODE y=f(t,y)y' = f(t, y) assigns a slope to every point (t,y)(t, y) in the plane. A direction field (also called a slope field) draws a short line segment at each point with the assigned slope. A solution y(t)y(t) is a curve that is tangent to the direction field at every point — an integral curve that threads through the field.

📐 Definition 1 (Ordinary Differential Equation (First-Order))

A first-order ordinary differential equation (ODE) is an equation of the form

y(t)=f(t,y(t)),y'(t) = f(t, y(t)),

where f:DRf: D \to \mathbb{R} is a given function on an open set DR2D \subseteq \mathbb{R}^2, and y:IRy: I \to \mathbb{R} is the unknown function defined on an interval IRI \subseteq \mathbb{R}. The variable tt is the independent variable (often representing time), yy is the dependent variable (the state), and ff is the right-hand side (the vector field, in 1D).

📐 Definition 2 (Initial Value Problem)

An initial value problem (IVP) is an ODE together with an initial condition:

y(t)=f(t,y(t)),y(t0)=y0,y'(t) = f(t, y(t)), \qquad y(t_0) = y_0,

where (t0,y0)D(t_0, y_0) \in D is the specified starting point. A solution is a differentiable function y:IRy: I \to \mathbb{R} (with t0It_0 \in I) satisfying both the equation and the initial condition.

📐 Definition 3 (Direction Field and Integral Curve)

The direction field of y=f(t,y)y' = f(t, y) is the assignment (t,y)(1,f(t,y))(t, y) \mapsto (1, f(t, y)) — a line element of slope f(t,y)f(t, y) at each point. An integral curve is a curve γ(t)=(t,y(t))\gamma(t) = (t, y(t)) that is tangent to the direction field at every point, i.e., y(t)=f(t,y(t))y'(t) = f(t, y(t)).

📝 Example 1 (Exponential growth)

Consider y=kyy' = ky with k>0k > 0. The direction field has slope kyky at (t,y)(t, y): horizontal along y=0y = 0, steeper as y|y| increases. Solutions are y(t)=Cekty(t) = Ce^{kt} — exponentials that grow away from the equilibrium y=0y = 0. Each initial condition y(0)=Cy(0) = C determines a unique integral curve.

💡 Remark 1 (Direction fields vs. gradient fields)

The direction field of y=f(t,y)y' = f(t, y) is a 1D analog of the gradient field L(θ)\nabla L(\theta) from The Gradient & Directional Derivatives. Gradient descent θ˙=L(θ)\dot{\theta} = -\nabla L(\theta) is a first-order ODE whose direction field is L-\nabla L. The integral curves are the gradient descent trajectories. This is not an analogy — it is literally the same mathematical structure.

Click anywhere on the field to launch a solution curve (up to 8)

Direction fields for three ODEs — exponential growth, logistic growth with equilibria, and oscillatory decay — each with solution curves threaded through the field

2. Separable Equations

The simplest class of first-order ODEs is the separable equation, where the right-hand side factors as a product of a function of tt alone and a function of yy alone. This factorization allows us to move all yy-terms to one side and all tt-terms to the other, then integrate each side independently.

📐 Definition 4 (Separable Equation)

An ODE y=f(t,y)y' = f(t, y) is separable if it can be written as y=g(t)h(y)y' = g(t)h(y) for continuous functions gg and hh. The general solution procedure is:

  1. Separate: 1h(y)dy=g(t)dt\displaystyle\frac{1}{h(y)}\,dy = g(t)\,dt.
  2. Integrate both sides: H(y)=G(t)+CH(y) = G(t) + C where HH and GG are antiderivatives.
  3. Solve for yy if possible, obtaining the solution in explicit form y=H1(G(t)+C)y = H^{-1}(G(t) + C).

📝 Example 2 (Logistic equation)

The logistic equation y=ry(1y/K)y' = ry(1 - y/K) is separable with g(t)=rg(t) = r and h(y)=y(1y/K)h(y) = y(1 - y/K). Partial fractions and integration give the logistic curve

y(t)=K1+(Ky01)ert,y(t) = \frac{K}{1 + \left(\frac{K}{y_0} - 1\right)e^{-rt}},

which appears as the sigmoid activation function in ML when K=1K = 1, r=1r = 1. The solution interpolates between the unstable equilibrium y=0y = 0 and the stable equilibrium y=Ky = K, approaching the carrying capacity from any positive initial condition.

📝 Example 3 (Newton's law of cooling)

The equation y=k(yTenv)y' = -k(y - T_{\text{env}}) models a body at temperature yy cooling toward ambient temperature TenvT_{\text{env}}. This is separable: dyyTenv=kdt\frac{dy}{y - T_{\text{env}}} = -k\,dt. Integrating gives y(t)=Tenv+(y0Tenv)ekty(t) = T_{\text{env}} + (y_0 - T_{\text{env}})e^{-kt} — an exponential decay toward equilibrium.

💡 Remark 2 (Equilibrium solutions and singular points)

When h(y0)=0h(y_0) = 0, the constant function y(t)y0y(t) \equiv y_0 is an equilibrium solution (the direction field is horizontal along y=y0y = y_0). Division by h(y)h(y) during separation is only valid away from these equilibria. The equilibria of the logistic equation are y=0y = 0 (unstable) and y=Ky = K (stable) — solutions approach KK from any positive initial condition.

Separable equations — logistic growth, Newton's cooling, and Gaussian solutions

3. Linear First-Order Equations

A linear first-order ODE y+p(t)y=q(t)y' + p(t)y = q(t) is not separable (unless q=0q = 0), but it has a systematic solution method: multiply both sides by the integrating factor μ(t)=ep(t)dt\mu(t) = e^{\int p(t)\,dt}, which converts the left side into an exact derivative via the product rule.

📐 Definition 5 (Linear First-Order ODE)

An ODE is linear of first order if it has the form

y+p(t)y=q(t),y' + p(t)y = q(t),

where pp and qq are continuous functions on an interval II. The equation is homogeneous if q(t)=0q(t) = 0 and inhomogeneous otherwise.

🔷 Theorem 1 (General Solution of Linear First-Order ODEs)

If pp and qq are continuous on an interval II containing t0t_0, then the IVP y+p(t)y=q(t)y' + p(t)y = q(t), y(t0)=y0y(t_0) = y_0 has the unique solution

y(t)=eP(t)(y0+t0tq(s)eP(s)ds)y(t) = e^{-P(t)}\left(y_0 + \int_{t_0}^{t} q(s)\,e^{P(s)}\,ds\right)

where P(t)=t0tp(s)dsP(t) = \int_{t_0}^{t} p(s)\,ds. In particular, the solution exists on all of II — there is no finite-time blow-up for linear equations.

Proof.

Multiply y+p(t)y=q(t)y' + p(t)y = q(t) by μ(t)=eP(t)\mu(t) = e^{P(t)}. The left side becomes ddt[μ(t)y(t)]\frac{d}{dt}[\mu(t) y(t)] by the product rule (Topic 5, Theorem 3): μy+μpy=μy+μy=(μy)\mu y' + \mu p y = \mu y' + \mu' y = (\mu y)'. So

(μy)=μq.(\mu y)' = \mu q.

Integrate from t0t_0 to tt:

μ(t)y(t)μ(t0)y(t0)=t0tμ(s)q(s)ds.\mu(t) y(t) - \mu(t_0) y(t_0) = \int_{t_0}^{t} \mu(s) q(s)\,ds.

Since μ(t0)=e0=1\mu(t_0) = e^0 = 1, solve for y(t)y(t):

y(t)=eP(t)(y0+t0tq(s)eP(s)ds).y(t) = e^{-P(t)}\left(y_0 + \int_{t_0}^{t} q(s)\,e^{P(s)}\,ds\right).

Uniqueness follows from the Picard-Lindelöf theorem (Theorem 3 below) since f(t,y)=q(t)p(t)yf(t, y) = q(t) - p(t)y is Lipschitz in yy with L=suppL = \sup|p|. \square

📝 Example 4 (First-order linear decay with forcing)

Consider y+2y=ety' + 2y = e^{-t}, y(0)=3y(0) = 3. The integrating factor is μ(t)=e2t\mu(t) = e^{2t}. Multiplying: (e2ty)=e2tet=et(e^{2t} y)' = e^{2t} \cdot e^{-t} = e^t. Integrating: e2ty=et+Ce^{2t} y = e^t + C, so y(t)=et+Ce2ty(t) = e^{-t} + Ce^{-2t}. From y(0)=3y(0) = 3: C=2C = 2. The solution is y(t)=et+2e2ty(t) = e^{-t} + 2e^{-2t} — the forced response ete^{-t} plus a transient 2e2t2e^{-2t} that decays faster.

💡 Remark 3 (Linear equations always have global solutions)

Theorem 1 guarantees that the solution exists on the entire interval II where pp and qq are continuous. This is a stronger result than the general Picard-Lindelöf theorem, which only guarantees local existence. The linearity of the equation prevents finite-time blow-up — the growth rate of yy is at most exponential, never faster. This global existence result breaks down for nonlinear equations (see Section 7).

Linear first-order ODE — direction field, solution family, integrating factor, and superposition

4. Exact Equations & Connection to Conservative Fields

An exact equation M(t,y)dt+N(t,y)dy=0M(t, y)\,dt + N(t, y)\,dy = 0 is one where the left side is the total differential of some potential function Ψ(t,y)\Psi(t, y): dΨ=Mdt+Ndyd\Psi = M\,dt + N\,dy. The solution curves are the level sets Ψ(t,y)=C\Psi(t, y) = C — exactly the conservative field structure from Line Integrals & Conservative Fields.

📐 Definition 6 (Exact Equation)

The equation M(t,y)dt+N(t,y)dy=0M(t, y)\,dt + N(t, y)\,dy = 0 is exact on a simply connected region DD if there exists a function Ψ:DR\Psi: D \to \mathbb{R} with Ψt=M\frac{\partial \Psi}{\partial t} = M and Ψy=N\frac{\partial \Psi}{\partial y} = N. The function Ψ\Psi is the potential function, and solutions are the level curves Ψ(t,y)=C\Psi(t, y) = C.

🔷 Theorem 2 (Exactness Criterion)

Let MM and NN have continuous first partial derivatives on a simply connected open set DD. Then Mdt+Ndy=0M\,dt + N\,dy = 0 is exact if and only if

My=Nt\frac{\partial M}{\partial y} = \frac{\partial N}{\partial t}

on DD. This is the ODE analog of the conservative field criterion F1y=F2x\frac{\partial F_1}{\partial y} = \frac{\partial F_2}{\partial x} from Line Integrals & Conservative Fields (Theorem 4).

📝 Example 5 (An exact equation)

Consider (2ty+3)dt+(t2+4y)dy=0(2ty + 3)\,dt + (t^2 + 4y)\,dy = 0. Check: My=2t=NtM_y = 2t = N_t. So Ψt=2ty+3Ψ=t2y+3t+g(y)\Psi_t = 2ty + 3 \Rightarrow \Psi = t^2 y + 3t + g(y). Then Ψy=t2+g(y)=t2+4yg(y)=4yg(y)=2y2\Psi_y = t^2 + g'(y) = t^2 + 4y \Rightarrow g'(y) = 4y \Rightarrow g(y) = 2y^2. Solution: t2y+3t+2y2=Ct^2 y + 3t + 2y^2 = C.

📝 Example 6 (Integrating factors for non-exact equations)

Consider (2y)dt+tdy=0(2y)\,dt + t\,dy = 0. Check: My=21=NtM_y = 2 \neq 1 = N_t. Not exact. Try an integrating factor μ(t)\mu(t): we need MyNtN=21t=1t\frac{M_y - N_t}{N} = \frac{2 - 1}{t} = \frac{1}{t} to depend only on tt. It does: μ(t)=e1/tdt=t\mu(t) = e^{\int 1/t\,dt} = t. Multiplying: (2ty)dt+t2dy=0(2ty)\,dt + t^2\,dy = 0. Now My=2t=NtM_y = 2t = N_t — exact. Potential: Ψt=2tyΨ=t2y\Psi_t = 2ty \Rightarrow \Psi = t^2 y. Solution: t2y=Ct^2 y = C.

💡 Remark 4 (The exact equation ↔ conservative field dictionary)

Exact equations and conservative fields are the same mathematics in different notation. The dictionary:

ODEVector calculus
M(t,y)dt+N(t,y)dy=0M(t, y)\,dt + N(t, y)\,dy = 0Field F=(M,N)\mathbf{F} = (M, N) with Ψ=F\nabla \Psi = \mathbf{F}
Exactness: My=NtM_y = N_tIrrotational: ×F=0\nabla \times \mathbf{F} = 0
Solution: Ψ(t,y)=C\Psi(t, y) = CFlow along level curves of the potential

The simply connected domain requirement (Topic 15, Definition 5) ensures there are no topological obstructions.

Exact equations — level curves of the potential function and comparison with non-exact fields

5. The Picard-Lindelöf Theorem — Existence and Uniqueness

We now turn to the deepest question in ODE theory: when does an IVP have a solution, and is it unique? The Picard-Lindelöf theorem answers: if ff is continuous and Lipschitz in yy, then the IVP y=f(t,y)y' = f(t, y), y(t0)=y0y(t_0) = y_0 has a unique local solution. The proof uses the contraction mapping principle from Inverse & Implicit Function Theorems (§8) — the same technique that proved the Inverse Function Theorem, now applied in a function space.

📐 Definition 7 (Lipschitz Condition)

A function f:DRf: D \to \mathbb{R} (with DR2D \subseteq \mathbb{R}^2) satisfies a Lipschitz condition in yy on DD if there exists a constant L0L \geq 0 such that

f(t,y1)f(t,y2)Ly1y2|f(t, y_1) - f(t, y_2)| \leq L|y_1 - y_2|

for all (t,y1),(t,y2)D(t, y_1), (t, y_2) \in D. The constant LL is the Lipschitz constant. If ff is C1C^1 in yy and f/yL|\partial f / \partial y| \leq L on DD, then ff is Lipschitz with constant LL (by the Mean Value Theorem).

🔷 Theorem 3 (The Picard-Lindelöf Theorem)

Let f:DRf: D \to \mathbb{R} be continuous on an open set DR2D \subseteq \mathbb{R}^2 and satisfy a Lipschitz condition in yy on DD with Lipschitz constant LL. Let (t0,y0)D(t_0, y_0) \in D. Then there exists δ>0\delta > 0 such that the IVP

y=f(t,y),y(t0)=y0y' = f(t, y), \qquad y(t_0) = y_0

has a unique solution y:[t0δ,t0+δ]Ry: [t_0 - \delta, t_0 + \delta] \to \mathbb{R}.

💡 Remark 5 (Integral formulation)

The IVP y=f(t,y)y' = f(t, y), y(t0)=y0y(t_0) = y_0 is equivalent to the integral equation

y(t)=y0+t0tf(s,y(s))ds.y(t) = y_0 + \int_{t_0}^{t} f(s, y(s))\,ds.

This reformulation is crucial: it converts a differential equation (involving derivatives) into an integral equation (involving only continuous functions and Riemann integration). The Picard iteration is defined in integral form, avoiding the need to differentiate the iterates.

💡 Remark 6 (The Picard-Lindelöf theorem and the IFT — same proof, different spaces)

The structure of the proof is identical to the IFT proof (Topic 12, §8). Define an operator TT, show it maps a closed set into itself, show it is a contraction, and invoke completeness. The only difference is the space: the IFT works in (Rn,)(\mathbb{R}^n, \|\cdot\|), while Picard-Lindelöf works in (C([t0δ,t0+δ]),)(C([t_0 - \delta, t_0 + \delta]), \|\cdot\|_\infty). The reader who followed the IFT proof already knows the playbook.

The Picard-Lindelöf setup — rectangle R with domain restriction and the contraction mapping argument

6. Picard Iteration — The Constructive Proof

The Picard-Lindelöf theorem is not merely an existence result — the proof is an algorithm. The Picard iterates

y0(t)y0,yn+1(t)=y0+t0tf(s,yn(s))dsy_0(t) \equiv y_0, \qquad y_{n+1}(t) = y_0 + \int_{t_0}^{t} f(s, y_n(s))\,ds

converge uniformly to the unique solution. Each iterate refines the previous approximation, and the Lipschitz condition ensures the refinements shrink geometrically — exactly the contraction mapping mechanism.

📐 Definition 8 (Picard Iteration)

The Picard iterates for the IVP y=f(t,y)y' = f(t, y), y(t0)=y0y(t_0) = y_0 are defined recursively:

y0(t)=y0(the constant initial guess)y_0(t) = y_0 \quad\text{(the constant initial guess)}

yn+1(t)=y0+t0tf(s,yn(s))dsfor n0.y_{n+1}(t) = y_0 + \int_{t_0}^{t} f(s, y_n(s))\,ds \quad\text{for } n \geq 0.

Proof.

We give the full proof via the contraction mapping principle.

Setup. Choose a closed rectangle R={(t,y):tt0a,yy0b}DR = \{(t, y) : |t - t_0| \leq a,\, |y - y_0| \leq b\} \subseteq D. Let M=maxRfM = \max_R |f| (which exists since ff is continuous on the compact set RR). Set δ=min(a,b/M)\delta = \min(a, b/M). Let X=C([t0δ,t0+δ])X = C([t_0 - \delta, t_0 + \delta]) with the sup-norm y=maxtt0δy(t)\|y\|_\infty = \max_{|t - t_0| \leq \delta} |y(t)|, and let S={yX:yy0b}S = \{y \in X : \|y - y_0\|_\infty \leq b\}. The set SS is closed in the complete metric space (X,)(X, \|\cdot\|_\infty).

Step 1 — TT maps SS into SS. Define the Picard operator T:SXT: S \to X by

(Ty)(t)=y0+t0tf(s,y(s))ds.(Ty)(t) = y_0 + \int_{t_0}^{t} f(s, y(s))\,ds.

For ySy \in S:

(Ty)(t)y0=t0tf(s,y(s))dsMtt0Mδb.|(Ty)(t) - y_0| = \left|\int_{t_0}^{t} f(s, y(s))\,ds\right| \leq M|t - t_0| \leq M\delta \leq b.

So TySTy \in S.

Step 2 — TT is a contraction (for small δ\delta). For y,zSy, z \in S:

(Ty)(t)(Tz)(t)=t0t[f(s,y(s))f(s,z(s))]dst0tLy(s)z(s)dsLδyz.|(Ty)(t) - (Tz)(t)| = \left|\int_{t_0}^{t} [f(s, y(s)) - f(s, z(s))]\,ds\right| \leq \int_{t_0}^{t} L|y(s) - z(s)|\,ds \leq L\delta \|y - z\|_\infty.

If we further require δ1/(2L)\delta \leq 1/(2L), then TyTz12yz\|Ty - Tz\|_\infty \leq \frac{1}{2}\|y - z\|_\infty. So TT is a contraction with factor λ=Lδ1/2\lambda = L\delta \leq 1/2.

Step 3 — Fixed point. The space (S,)(S, \|\cdot\|_\infty) is a closed subset of a complete metric space, hence complete. By the contraction mapping theorem (Topic 12, Theorem 6), TT has a unique fixed point ySy^* \in S:

y(t)=y0+t0tf(s,y(s))ds.y^*(t) = y_0 + \int_{t_0}^{t} f(s, y^*(s))\,ds.

This yy^* is the unique solution to the IVP on [t0δ,t0+δ][t_0 - \delta, t_0 + \delta].

Step 4 — Convergence of Picard iterates. The contraction mapping theorem also guarantees that ynyy_n \to y^* uniformly:

ynyλn1λy1y00.\|y_n - y^*\|_\infty \leq \frac{\lambda^n}{1 - \lambda}\|y_1 - y_0\|_\infty \to 0.

The convergence is geometric with rate λ=Lδ\lambda = L\delta. \square

📝 Example 7 (Picard iteration for y' = y, y(0) = 1)

Starting from y0(t)=1y_0(t) = 1:

y1(t)=1+0t1ds=1+ty_1(t) = 1 + \int_0^t 1\,ds = 1 + t

y2(t)=1+0t(1+s)ds=1+t+t22y_2(t) = 1 + \int_0^t (1 + s)\,ds = 1 + t + \frac{t^2}{2}

y3(t)=1+t+t22+t36y_3(t) = 1 + t + \frac{t^2}{2} + \frac{t^3}{6}

The pattern is yn(t)=k=0ntk/k!y_n(t) = \sum_{k=0}^{n} t^k/k! — the partial sums of ete^t. The Picard iterates converge to the Taylor series of the exact solution y=ety = e^t.

📝 Example 8 (Picard iteration for y' = t + y², y(0) = 0)

Starting from y0(t)=0y_0(t) = 0:

y1(t)=0tsds=t22y_1(t) = \int_0^t s\,ds = \frac{t^2}{2}

y2(t)=0t(s+s44)ds=t22+t520.y_2(t) = \int_0^t \left(s + \frac{s^4}{4}\right)ds = \frac{t^2}{2} + \frac{t^5}{20}.

The iterates grow in complexity — no closed-form solution exists for this Riccati equation, but the Picard iterates provide a convergent sequence of polynomial approximations.

Picard iterate y_3  |  n = 3 |  max error = 2.9533

Picard iterates converging to the exact solution — y' = y, y' = t + y², and error decay

7. Maximal Intervals & Blow-Up — When Solutions End

The Picard-Lindelöf theorem guarantees local existence — a solution on some interval [t0δ,t0+δ][t_0 - \delta, t_0 + \delta]. Can the solution always be extended? The maximal interval of existence is the largest interval on which the solution exists. If this interval is bounded, the solution must blow up (escape to infinity) at the boundary.

🔷 Proposition 1 (Extension Theorem)

If yy is a solution on (a,b)(a, b) and limtb(t,y(t))\lim_{t \to b^-} (t, y(t)) exists and lies in DD, then yy can be extended past bb. Equivalently, if yy cannot be extended past bb, then either y(t)|y(t)| \to \infty as tbt \to b^- or (t,y(t))(t, y(t)) approaches the boundary of DD.

📝 Example 9 (Finite-time blow-up)

Consider y=y2y' = y^2, y(0)=1y(0) = 1. This is separable: 1/y=t+C-1/y = t + C, so y(t)=1/(1t)y(t) = 1/(1 - t). The solution blows up at t=1t = 1: y(t)+y(t) \to +\infty as t1t \to 1^-. The maximal interval of existence is (,1)(-\infty, 1), not (,+)(-\infty, +\infty).

The nonlinearity y2y^2 amplifies growth faster than exponential, causing the solution to reach infinity in finite time. Compare with the linear equation y=yy' = y, whose solution y=ety = e^t grows but never blows up.

💡 Remark 7 (Linear equations never blow up)

Theorem 1 guarantees that solutions to linear equations y+p(t)y=q(t)y' + p(t)y = q(t) exist on the entire interval where pp and qq are continuous. This is because linear growth f(t,y)C(1+y)|f(t, y)| \leq C(1 + |y|) prevents finite-time blow-up by Gronwall’s inequality. Nonlinear equations (y2y^2, y3y^3, tany\tan y) can violate this bound and reach infinity in finite time.

Blow-up comparison — y' = y² solution diverging at t = 1 vs. y' = y growing exponentially

8. Uniqueness Failure — The Peano Theorem and Non-Lipschitz ODEs

What happens without the Lipschitz condition? The Peano existence theorem guarantees that existence requires only continuity of ff — no Lipschitz condition needed. But uniqueness fails without Lipschitz continuity: multiple solutions can pass through the same initial point, leading to branching trajectories.

🔷 Theorem 4 (Peano Existence Theorem)

If ff is continuous on an open set DR2D \subseteq \mathbb{R}^2 containing (t0,y0)(t_0, y_0), then the IVP y=f(t,y)y' = f(t, y), y(t0)=y0y(t_0) = y_0 has at least one local solution. (No Lipschitz condition required.) The proof uses the Arzelà-Ascoli theorem (compactness in function spaces) rather than contraction mapping.

📝 Example 10 (Uniqueness failure — y' = y^{2/3})

Consider y=y2/3y' = y^{2/3}, y(0)=0y(0) = 0. The function f(y)=y2/3f(y) = y^{2/3} is continuous but not Lipschitz at y=0y = 0 (the derivative f(y)=23y1/3f'(y) = \frac{2}{3}y^{-1/3} \to \infty as y0y \to 0).

Two solutions through (0,0)(0, 0):

  1. y(t)0y(t) \equiv 0 (the trivial solution).
  2. y(t)=(t/3)3=t3/27y(t) = (t/3)^3 = t^3/27 (verified: y=t2/9=(t3/27)2/3y' = t^2/9 = (t^3/27)^{2/3}).

In fact, for any c0c \geq 0, the function

yc(t)={0,tc((tc)/3)3,t>cy_c(t) = \begin{cases} 0, & t \leq c \\ ((t-c)/3)^3, & t > c \end{cases}

is also a solution — there are infinitely many solutions through the origin.

💡 Remark 8 (Why Lipschitz is the right condition)

The Lipschitz condition f(t,y1)f(t,y2)Ly1y2|f(t, y_1) - f(t, y_2)| \leq L|y_1 - y_2| prevents the right-hand side from changing “too fast” as yy varies. When ff is not Lipschitz (as in y2/3y^{2/3} near y=0y = 0), the direction field near the initial point is too “flat” — nearby solutions cannot distinguish themselves, and multiple integral curves can merge or branch at the non-Lipschitz point. The Lipschitz condition is the minimal regularity that prevents this pathology.

Left: unique solutions through every point (Lipschitz).   Right: infinitely many solutions through the origin (non-Lipschitz).   branching point.

Uniqueness comparison — unique solutions for y' = y vs. branching solutions for y' = y^2/3

9. ML Connections — Gradient Flow, Neural ODEs, and the Adjoint Method

ODEs are not just classical analysis — they are the mathematical backbone of several central ideas in modern machine learning. Gradient descent is a discretized ODE. Neural ODEs replace discrete layers with continuous dynamics. The Picard iteration is a fixed-point computation analogous to training.

9.1 Gradient flow as an ODE

Gradient descent θk+1=θkηL(θk)\theta_{k+1} = \theta_k - \eta \nabla L(\theta_k) is the Euler discretization (step size η\eta) of the gradient flow ODE

θ˙(t)=L(θ(t)).\dot{\theta}(t) = -\nabla L(\theta(t)).

When L\nabla L is Lipschitz (with constant LL), the Picard-Lindelöf theorem guarantees the gradient flow has a unique trajectory from any initial θ0\theta_0. Convergence of the continuous flow to a critical point (L(θ)=0\nabla L(\theta^*) = 0) is analyzed by treating L(θ(t))L(\theta(t)) as a Lyapunov function:

ddtL(θ(t))=Lθ˙=L20.\frac{d}{dt}L(\theta(t)) = \nabla L \cdot \dot{\theta} = -\|\nabla L\|^2 \leq 0.

The loss decreases monotonically along the flow — the continuous-time version of the “gradient descent decreases the loss” guarantee. The learning rate η\eta controls how well the discrete steps approximate the continuous trajectory.

Gradient Descent → formalML

9.2 Neural ODEs

Chen et al. (2018) observed that a residual network ht+1=ht+fθ(ht,t)h_{t+1} = h_t + f_\theta(h_t, t) (where tt indexes layers, not time) is the Euler discretization of h˙(t)=fθ(h(t),t)\dot{h}(t) = f_\theta(h(t), t). Replacing discrete layers with the continuous ODE yields a neural ODE: the forward pass solves the IVP h˙=fθ(h,t)\dot{h} = f_\theta(h, t), h(0)=xh(0) = x using a black-box ODE solver (Runge-Kutta, adaptive stepping — see Numerical Methods for ODEs).

The Picard-Lindelöf theorem guarantees the forward pass has a unique solution when fθf_\theta is Lipschitz in hh, which holds when the network weights are bounded. Memory cost is O(1)O(1) regardless of depth (the ODE solver stores only the current state), whereas a residual network with LL layers incurs O(L)O(L) memory cost.

9.3 The adjoint method

Backpropagation through a neural ODE cannot store intermediate activations (there are infinitely many “layers”). Instead, the adjoint method computes gradients by solving a second ODE backward in time. Define the adjoint a(t)=L/h(t)a(t) = \partial \mathcal{L} / \partial h(t). Then aa satisfies the adjoint ODE

a˙(t)=a(t)fθh(h(t),t),\dot{a}(t) = -a(t) \cdot \frac{\partial f_\theta}{\partial h}(h(t), t),

integrated backward from a(T)=L/h(T)a(T) = \partial \mathcal{L} / \partial h(T). The gradient with respect to parameters is

dLdθ=0Ta(t)fθθ(h(t),t)dt.\frac{d\mathcal{L}}{d\theta} = -\int_0^T a(t) \cdot \frac{\partial f_\theta}{\partial \theta}(h(t), t)\,dt.

This is another first-order ODE — and its existence is guaranteed by Picard-Lindelöf under the same Lipschitz conditions.

9.4 Picard iteration as a training analog

The Picard iteration yn+1=T(yn)y_{n+1} = T(y_n) converges to a fixed point of the operator TT. Training a neural network θk+1=θkηL(θk)\theta_{k+1} = \theta_k - \eta \nabla L(\theta_k) is also a fixed-point iteration — it converges to a fixed point of the gradient descent operator. Deep equilibrium models (DEQs, Bai et al. 2019) make this analogy explicit: the forward pass finds a fixed point z=fθ(z)z^* = f_\theta(z^*) by iterating zk+1=fθ(zk)z_{k+1} = f_\theta(z_k) until convergence, and backpropagation through the fixed point uses the Implicit Function Theorem to compute z/θ\partial z^* / \partial \theta.

Smooth Manifolds → formalML → Measure-Theoretic Probability → formalML

ML connections — gradient flow, neural ODEs, the adjoint method, and Picard-as-training

10. Computational Notes

  • Direction field visualization: Compute f(t,y)f(t, y) on a grid and draw line segments with slope ff at each gridpoint. Grid density should be at least 20×20 for visual clarity. Normalize segment lengths by 1+f2\sqrt{1 + f^2} to keep the display uniform.
  • Numerical solution: scipy.integrate.solve_ivp solves IVPs with adaptive Runge-Kutta methods (default: RK45). For stiff equations, use method='Radau' or 'BDF'. The Picard iteration converges in theory, but is impractical for computation — the integrals become intractable after a few iterations for most equations.
  • Blow-up detection: Adaptive solvers detect blow-up when the step size shrinks below machine epsilon. Setting max_step and monitoring y|y| can provide early warning.
  • Lipschitz constant estimation: For f(t,y)f(t, y) with bounded f/y\partial f / \partial y, estimate LL numerically: Lmax(t,y)Rf/y(t,y)L \approx \max_{(t,y) \in R} |\partial f / \partial y(t, y)|, computed on a grid via the Mean Value Theorem.

11. Connections & Further Reading

This is the first of four topics in the Ordinary Differential Equations track. The track progresses from first-order scalar equations (this topic) to systems and the matrix exponential (Linear Systems & Matrix Exponential), to qualitative analysis of equilibria and stability (Stability & Dynamical Systems), and finally to numerical methods (Numerical Methods for ODEs). The thread connecting them all is the interplay between existence (does a solution exist?), structure (what does the solution look like?), and computation (how do we find it?).

Prerequisites used in this topic:

Looking ahead in this track:

  • Linear Systems & Matrix Exponential — systems y=Ay\mathbf{y}' = A\mathbf{y} generalize scalar linear equations; the matrix exponential eAte^{At} extends the scalar solution eate^{at}
  • Stability & Dynamical Systems — phase portraits and Lyapunov stability classify the long-time behavior near equilibria
  • Numerical Methods for ODEs — Euler’s method is the simplest discretization; RK4 and adaptive methods are the workhorses
  • Metric Spaces & Topology — the Banach Contraction Mapping Theorem is proved in full, giving the abstract result that powers the Picard-Lindelöf argument from this topic and the Inverse Function Theorem from Topic 12

References

  1. book Arnold (1992). Ordinary Differential Equations Chapters 1–4 develop the geometric viewpoint: ODEs as vector fields, integral curves as trajectories, the phase plane. The primary reference for our geometric-first approach and the direction field visualizations
  2. book Teschl (2012). Ordinary Differential Equations Chapters 1–2 provide a rigorous treatment of existence and uniqueness via the contraction mapping principle. Available free online. Our model for the Picard-Lindelöf proof structure
  3. book Rudin (1976). Principles of Mathematical Analysis Chapter 9 (Theorem 9.12) proves the Picard-Lindelöf theorem as a contraction mapping application, connecting to the IFT proof in the same chapter. Useful for the condensed proof style
  4. book Boyce & DiPrima (2012). Elementary Differential Equations and Boundary Value Problems Chapters 2–3 provide the standard undergraduate treatment of first-order methods (separable, linear, exact) with extensive examples. Reference for the computational sections
  5. paper Chen, Rubanova, Bettencourt & Duvenaud (2018). “Neural Ordinary Differential Equations” The foundational neural ODE paper — replaces discrete residual layers with continuous dynamics, uses the adjoint method for backpropagation. The primary ML connection for this topic
  6. paper Massaroli, Poli, Park, Yamashita & Asama (2020). “Dissecting Neural ODEs” A comprehensive analysis of neural ODE architectures, training dynamics, and expressiveness. Complements the Chen et al. paper with practical insights
  7. paper Santurkar, Tsipras, Ilyas & Madry (2018). “How Does Batch Normalization Help Optimization?” Shows that batch normalization smooths the loss landscape, making ∇L Lipschitz — exactly the condition needed for Picard-Lindelöf to guarantee gradient flow solutions