ODEs · intermediate · 40 min read

Stability & Dynamical Systems

When eigenvalues predict the future — linearization, Lyapunov functions, bifurcations, and the qualitative theory that tells you whether trajectories converge, diverge, or orbit forever.

Abstract. Stability analysis is the qualitative theory of differential equations — the art of predicting long-term behavior without solving the ODE explicitly. The linearization stability theorem connects the eigenvalues of the Jacobian at an equilibrium to the local phase portrait, reducing nonlinear analysis to the linear classification from Topic 22. Lyapunov's direct method provides a complementary energy-based approach: if a scalar function V decreases along trajectories, the system is stable. Bifurcation theory asks what happens when parameters change — equilibria can appear, disappear, or exchange stability at critical thresholds. These tools are the mathematical backbone of gradient descent convergence analysis, GAN training dynamics, and neural ODE stability.

Where this leads → formalML

  • formalML Convergence of gradient descent to a local minimum is asymptotic stability of the gradient flow ODE. The Hessian eigenvalues at a critical point determine whether GD converges (stable) or diverges (unstable). Saddle-point escape is exit along the unstable manifold.
  • formalML Natural gradient descent stability depends on the eigenvalues of the preconditioned Hessian F⁻¹∇²L, where the Fisher metric reshapes the convergence landscape.
  • formalML Stable and unstable manifolds of equilibria are immersed submanifolds. The Poincaré-Bendixson theorem extends to compact 2-manifolds. Dynamical systems on manifolds generalize phase portraits to curved spaces.
  • formalML The Lyapunov equation A^T P + PA = -Q is a Sylvester equation whose solvability depends on the spectra of A and -A^T. The spectral theorem guarantees P is symmetric positive-definite when A is Hurwitz.

1. What Happens Next?

You train a neural network and the loss plateaus. Is it stuck at a local minimum — a stable equilibrium where gradient descent has settled in? Is it perched on a saddle point, about to escape along some direction you haven’t explored yet? Or is it approaching a slow-converging valley where the landscape is nearly flat?

The answer depends on the eigenvalues of the Hessian 2L(θ)\nabla^2 L(\theta^*) at the critical point. All eigenvalues positive means stable minimum. Any eigenvalue negative means saddle — the corresponding eigenvector points along the escape route. Near-zero eigenvalues mean near-degeneracy — convergence along those directions is glacially slow.

This is exactly the linearization stability theorem applied to the gradient flow ODE θ˙=L(θ)\dot{\theta} = -\nabla L(\theta). The Jacobian of the right-hand side at a critical point is 2L(θ)-\nabla^2 L(\theta^*), and its eigenvalues — the negatives of the Hessian eigenvalues — determine local stability.

This topic develops the complete qualitative theory of differential equations. We build three tools:

  1. Linearization — reduce a nonlinear system to a linear one near an equilibrium by computing the Jacobian. The eigenvalue classification from Topic 22 tells us the local behavior.
  2. Lyapunov functions — find a scalar “energy” function that decreases along trajectories. If such a function exists, the system is stable — no need to solve the ODE.
  3. Bifurcation theory — study how qualitative behavior changes as parameters vary. Equilibria can appear, disappear, or exchange stability at critical thresholds.

Together, these tools explain why gradient descent converges, why GANs oscillate, and why learning rate schedules work.

2. Equilibria & Nullclines

We work with autonomous 2D systems: y˙1=f1(y1,y2)\dot{y}_1 = f_1(y_1, y_2), y˙2=f2(y1,y2)\dot{y}_2 = f_2(y_1, y_2), written compactly as y˙=f(y)\dot{\mathbf{y}} = \mathbf{f}(\mathbf{y}).

📐 Definition 1 (Equilibrium Point (Stationary Point, Fixed Point))

A point yR2\mathbf{y}^* \in \mathbb{R}^2 is an equilibrium of y˙=f(y)\dot{\mathbf{y}} = \mathbf{f}(\mathbf{y}) if f(y)=0\mathbf{f}(\mathbf{y}^*) = \mathbf{0}. If y(0)=y\mathbf{y}(0) = \mathbf{y}^*, then y(t)=y\mathbf{y}(t) = \mathbf{y}^* for all tt — the system stays put.

Finding equilibria means solving f1(y1,y2)=0f_1(y_1, y_2) = 0 and f2(y1,y2)=0f_2(y_1, y_2) = 0 simultaneously. For nonlinear systems, this can be difficult. Nullclines make the search geometric.

📐 Definition 2 (Nullcline)

The y1y_1-nullcline (or y˙1\dot{y}_1-nullcline) is the curve {(y1,y2):f1(y1,y2)=0}\{ (y_1, y_2) : f_1(y_1, y_2) = 0 \}. On this curve, y˙1=0\dot{y}_1 = 0 — trajectories move purely vertically. Similarly, the y2y_2-nullcline is {(y1,y2):f2(y1,y2)=0}\{ (y_1, y_2) : f_2(y_1, y_2) = 0 \}, where trajectories move purely horizontally. Equilibria occur where the nullclines intersect.

📝 Example 1 (Lotka-Volterra Equilibria and Nullclines)

The predator-prey system is: x˙=αxβxy,y˙=δxyγy\dot{x} = \alpha x - \beta xy, \qquad \dot{y} = \delta xy - \gamma y where xx is prey population, yy is predator population, and α,β,δ,γ>0\alpha, \beta, \delta, \gamma > 0.

xx-nullcline: x(αβy)=0x(\alpha - \beta y) = 0, so x=0x = 0 or y=α/βy = \alpha/\beta.

yy-nullcline: y(δxγ)=0y(\delta x - \gamma) = 0, so y=0y = 0 or x=γ/δx = \gamma/\delta.

Equilibria are the intersections:

  • (0,0)(0, 0) — both species extinct (trivial equilibrium)
  • (γ/δ,α/β)(\gamma/\delta, \alpha/\beta) — coexistence equilibrium

The nullclines divide the positive quadrant into four regions, and in each region we can determine the direction of the flow by checking the sign of f1f_1 and f2f_2. This gives a coarse picture of the dynamics without solving anything.

📝 Example 2 (Damped Pendulum Equilibria)

The damped pendulum θ¨+bθ˙+sinθ=0\ddot{\theta} + b\dot{\theta} + \sin\theta = 0 becomes a first-order system: θ˙=ω,ω˙=sinθbω\dot{\theta} = \omega, \qquad \dot{\omega} = -\sin\theta - b\omega

θ\theta-nullcline: ω=0\omega = 0 (the θ\theta-axis).

ω\omega-nullcline: sinθ=bω\sin\theta = -b\omega (an S-shaped curve through the origin).

Equilibria: ω=0\omega = 0 and sinθ=0\sin\theta = 0, giving (θ,ω)=(nπ,0)(\theta^*, \omega^*) = (n\pi, 0) for all integers nn. The pendulum has infinitely many equilibria — the downward rest positions (2nπ,0)(2n\pi, 0) and the upward balance points ((2n+1)π,0)((2n+1)\pi, 0).

💡 Remark 1 (Nullclines Reduce a 2D Search to Curve Intersections)

Without nullclines, finding equilibria requires solving two nonlinear equations simultaneously — a 2D root-finding problem. Nullclines reduce this to a 1D problem: trace each nullcline as a curve, then find where they cross. The flow directions between nullclines then give a rough sketch of the phase portrait for free.

Nullclines and equilibria for the Lotka-Volterra predator-prey system (left) and the damped pendulum (right). Dashed curves are nullclines; filled circles mark equilibria. Arrows indicate flow direction in each region.

3. The Linearization Stability Theorem

We now develop the fundamental tool for classifying equilibria of nonlinear systems. The idea: near an equilibrium, a nonlinear system looks like a linear system, and we already know how to classify linear systems from Topic 22.

📐 Definition 3 (Stability, Asymptotic Stability, and Instability (Lyapunov))

Let y\mathbf{y}^* be an equilibrium of y˙=f(y)\dot{\mathbf{y}} = \mathbf{f}(\mathbf{y}).

  • y\mathbf{y}^* is stable (in the sense of Lyapunov) if for every ε>0\varepsilon > 0 there exists δ>0\delta > 0 such that y(0)y<δ\|\mathbf{y}(0) - \mathbf{y}^*\| < \delta implies y(t)y<ε\|\mathbf{y}(t) - \mathbf{y}^*\| < \varepsilon for all t0t \geq 0. Trajectories starting close stay close.

  • y\mathbf{y}^* is asymptotically stable if it is stable and additionally y(t)y0\|\mathbf{y}(t) - \mathbf{y}^*\| \to 0 as tt \to \infty for all y(0)\mathbf{y}(0) sufficiently close to y\mathbf{y}^*. Trajectories starting close converge to the equilibrium.

  • y\mathbf{y}^* is unstable if it is not stable — some trajectories starting arbitrarily close eventually leave a neighborhood of y\mathbf{y}^*.

📐 Definition 4 (Hyperbolic Equilibrium)

An equilibrium y\mathbf{y}^* is hyperbolic if every eigenvalue of the Jacobian Jf(y)J_{\mathbf{f}}(\mathbf{y}^*) has nonzero real part: Re(λi)0\operatorname{Re}(\lambda_i) \neq 0 for all ii. In the language of Topic 22, the linearized system has no center component — it is a node, saddle, or spiral, never a center.

Here is the main theorem. It says that at a hyperbolic equilibrium, linearization tells the whole story.

🔷 Theorem 1 (Linearization Stability Theorem (Hartman-Grobman, Stability Version))

Let y\mathbf{y}^* be an equilibrium of y˙=f(y)\dot{\mathbf{y}} = \mathbf{f}(\mathbf{y}) with f\mathbf{f} continuously differentiable, and let J=Jf(y)J = J_{\mathbf{f}}(\mathbf{y}^*) be the Jacobian at y\mathbf{y}^*.

  1. If all eigenvalues of JJ satisfy Re(λi)<0\operatorname{Re}(\lambda_i) < 0, then y\mathbf{y}^* is asymptotically stable.
  2. If any eigenvalue of JJ satisfies Re(λi)>0\operatorname{Re}(\lambda_i) > 0, then y\mathbf{y}^* is unstable.
  3. If all eigenvalues have Re(λi)0\operatorname{Re}(\lambda_i) \neq 0 (the equilibrium is hyperbolic), the phase portrait near y\mathbf{y}^* is qualitatively equivalent to that of the linear system z˙=Jz\dot{\mathbf{z}} = J\mathbf{z}.

Proof.

We prove statement (1); statement (2) follows by a time-reversal argument.

Set z=yy\mathbf{z} = \mathbf{y} - \mathbf{y}^*. Taylor expansion gives: z˙=f(y+z)=f(y)=0+Jz+r(z)\dot{\mathbf{z}} = \mathbf{f}(\mathbf{y}^* + \mathbf{z}) = \underbrace{\mathbf{f}(\mathbf{y}^*)}_{= \mathbf{0}} + J\mathbf{z} + \mathbf{r}(\mathbf{z}) where the remainder r(z)\mathbf{r}(\mathbf{z}) satisfies r(z)Cz2\|\mathbf{r}(\mathbf{z})\| \leq C\|\mathbf{z}\|^2 for z\|\mathbf{z}\| sufficiently small (since f\mathbf{f} is C1C^1).

Step 1: The linear part decays exponentially. Since all eigenvalues of JJ have Re(λi)<0\operatorname{Re}(\lambda_i) < 0, there exist constants M>0M > 0 and α>0\alpha > 0 such that eJtMeαt\|e^{Jt}\| \leq Me^{-\alpha t} for all t0t \geq 0. (This uses the Jordan normal form and the fact that α\alpha can be taken as any number smaller than miniRe(λi)\min_i |\operatorname{Re}(\lambda_i)|.)

Step 2: Variation of constants formula. The solution of z˙=Jz+r(z)\dot{\mathbf{z}} = J\mathbf{z} + \mathbf{r}(\mathbf{z}) satisfies the integral equation: z(t)=eJtz(0)+0teJ(ts)r(z(s))ds\mathbf{z}(t) = e^{Jt}\mathbf{z}(0) + \int_0^t e^{J(t-s)}\mathbf{r}(\mathbf{z}(s))\,ds

Step 3: Gronwall estimate. Let u(t)=z(t)u(t) = \|\mathbf{z}(t)\|. Then: u(t)Meαtu(0)+0tMeα(ts)Cu(s)2dsu(t) \leq Me^{-\alpha t}\,u(0) + \int_0^t Me^{-\alpha(t-s)} C\,u(s)^2\,ds

For u(0)u(0) sufficiently small (say u(0)εu(0) \leq \varepsilon with ε<α/(2MC)\varepsilon < \alpha/(2MC)), a continuation argument shows that u(t)2Mεeαt/2u(t) \leq 2M\varepsilon\, e^{-\alpha t/2} for all t0t \geq 0.

Specifically: suppose u(t)2Mεu(t) \leq 2M\varepsilon on [0,T][0, T]. Then on this interval: u(t)Meαtε+0tMeα(ts)C(2Mε)u(s)dsu(t) \leq Me^{-\alpha t}\varepsilon + \int_0^t Me^{-\alpha(t-s)} C \cdot (2M\varepsilon) \cdot u(s)\,ds

By Gronwall’s inequality, if 2MCε<α2MC\varepsilon < \alpha, the integral term does not overcome the exponential decay of the first term. The bound u(t)2Mεeαt/2u(t) \leq 2M\varepsilon\,e^{-\alpha t/2} holds on [0,T][0, T], and since the bound does not saturate, it extends beyond TT — by continuity, the bound holds for all t0t \geq 0.

Conclusion: z(t)0\|\mathbf{z}(t)\| \to 0 exponentially, so y\mathbf{y}^* is asymptotically stable. \blacksquare

📝 Example 3 (Classify Lotka-Volterra Equilibria via Linearization)

From Example 1, the Lotka-Volterra system x˙=αxβxy\dot{x} = \alpha x - \beta xy, y˙=δxyγy\dot{y} = \delta xy - \gamma y has equilibria at (0,0)(0,0) and (γ/δ,α/β)(\gamma/\delta, \alpha/\beta).

The Jacobian is: J=(αβyβxδyδxγ)J = \begin{pmatrix} \alpha - \beta y & -\beta x \\ \delta y & \delta x - \gamma \end{pmatrix}

At (0,0)(0, 0): J=(α00γ)J = \begin{pmatrix} \alpha & 0 \\ 0 & -\gamma \end{pmatrix}. Eigenvalues: λ1=α>0\lambda_1 = \alpha > 0, λ2=γ<0\lambda_2 = -\gamma < 0. This is a saddle point — unstable. The prey axis is the unstable manifold (prey grow exponentially without predators); the predator axis is the stable manifold (predators die without prey).

At (γ/δ,α/β)(\gamma/\delta, \alpha/\beta): J=(0βγ/δδα/β0)J = \begin{pmatrix} 0 & -\beta\gamma/\delta \\ \delta\alpha/\beta & 0 \end{pmatrix}. The eigenvalues are λ=±iαγ\lambda = \pm i\sqrt{\alpha\gamma} — pure imaginary. This is a non-hyperbolic center in the linearization. Linearization is inconclusive! (In fact, the nonlinear system has a conserved quantity, so the orbits are genuinely closed — but the linearization cannot tell us this.)

📝 Example 4 (Non-Hyperbolic Case — Center vs. Spiral)

Consider the family: y˙1=μy1y2y1(y12+y22),y˙2=y1+μy2y2(y12+y22)\dot{y}_1 = \mu y_1 - y_2 - y_1(y_1^2 + y_2^2), \qquad \dot{y}_2 = y_1 + \mu y_2 - y_2(y_1^2 + y_2^2)

The Jacobian at the origin is J=(μ11μ)J = \begin{pmatrix} \mu & -1 \\ 1 & \mu \end{pmatrix} with eigenvalues λ=μ±i\lambda = \mu \pm i.

  • For μ<0\mu < 0: Re(λ)<0\operatorname{Re}(\lambda) < 0stable spiral (linearization is conclusive).
  • For μ=0\mu = 0: Re(λ)=0\operatorname{Re}(\lambda) = 0 → linearization gives a center, but the nonlinear cubic terms create a stable spiral (trajectories decay).
  • For μ>0\mu > 0: Re(λ)>0\operatorname{Re}(\lambda) > 0unstable spiral (linearization is conclusive), and a stable limit cycle of radius μ\sqrt{\mu} appears.

The non-hyperbolic case μ=0\mu = 0 shows that linearization alone cannot distinguish a true center from a spiral when eigenvalues are pure imaginary. We need Lyapunov functions (Section 4) or the Hopf bifurcation theorem (Section 8).

💡 Remark 2 (Hyperbolicity as a Genericity Condition)

Non-hyperbolic equilibria are “exceptional” — they require eigenvalues to land exactly on the imaginary axis, which is a codimension-one condition in parameter space. Generically (for “almost all” parameter values), all equilibria are hyperbolic. Non-hyperbolic equilibria typically occur at bifurcation points — the boundaries between qualitatively different behaviors (Section 8).

Linearization of a nonlinear system near a stable equilibrium. Left: the nonlinear phase portrait. Center: the linearized phase portrait (a stable spiral). Right: the eigenvalue plane, showing eigenvalues in the left half-plane (stable region).

Equilibria: (1.95, 1.97) — Unstable Spiral
x (prey) nullcline y (predator) nullcline
Classic predator-prey model. The nonzero equilibrium (γ/δ, α/β) is a center in the linearization — closed orbits in the nonlinear system.

4. Lyapunov’s Direct Method

Linearization classifies equilibria using the Jacobian’s eigenvalues — a local, algebraic test. Lyapunov’s direct method takes a completely different approach: find a scalar “energy” function that decreases along trajectories. If such a function exists, the system is stable, and we never need to solve the ODE or compute eigenvalues.

The key is the orbital derivative: if V(y)V(\mathbf{y}) is a smooth scalar function and y(t)\mathbf{y}(t) is a solution of y˙=f(y)\dot{\mathbf{y}} = \mathbf{f}(\mathbf{y}), then by the chain rule: V˙=ddtV(y(t))=V(y)f(y)\dot{V} = \frac{d}{dt}V(\mathbf{y}(t)) = \nabla V(\mathbf{y}) \cdot \mathbf{f}(\mathbf{y})

This is computable directly from VV and f\mathbf{f} — we don’t need the solution y(t)\mathbf{y}(t).

📐 Definition 5 (Lyapunov Function)

A continuously differentiable function V:URV: U \to \mathbb{R} defined on a neighborhood UU of an equilibrium y\mathbf{y}^* is a Lyapunov function if:

  1. V(y)=0V(\mathbf{y}^*) = 0 and V(y)>0V(\mathbf{y}) > 0 for yy\mathbf{y} \neq \mathbf{y}^* in UU (positive-definite)
  2. V˙(y)=V(y)f(y)0\dot{V}(\mathbf{y}) = \nabla V(\mathbf{y}) \cdot \mathbf{f}(\mathbf{y}) \leq 0 in UU (negative-semidefinite orbital derivative)

If additionally V˙(y)<0\dot{V}(\mathbf{y}) < 0 for yy\mathbf{y} \neq \mathbf{y}^* (negative-definite), then VV is a strict Lyapunov function.

🔷 Theorem 2 (Lyapunov Stability Theorem)

Let y\mathbf{y}^* be an equilibrium of y˙=f(y)\dot{\mathbf{y}} = \mathbf{f}(\mathbf{y}).

  1. If there exists a Lyapunov function VV with V˙0\dot{V} \leq 0, then y\mathbf{y}^* is stable.
  2. If there exists a strict Lyapunov function VV with V˙<0\dot{V} < 0 for yy\mathbf{y} \neq \mathbf{y}^*, then y\mathbf{y}^* is asymptotically stable.

Proof.

We prove statement (1). Statement (2) follows by a similar argument with the additional conclusion that V(y(t))0V(\mathbf{y}(t)) \to 0 forces y(t)y\mathbf{y}(t) \to \mathbf{y}^*.

Goal: For every ε>0\varepsilon > 0, find δ>0\delta > 0 such that y(0)y<δ\|\mathbf{y}(0) - \mathbf{y}^*\| < \delta implies y(t)y<ε\|\mathbf{y}(t) - \mathbf{y}^*\| < \varepsilon for all t0t \geq 0.

Step 1: Sublevel set confinement. Fix ε>0\varepsilon > 0 and consider the compact set Sε={y:yy=ε}S_\varepsilon = \{ \mathbf{y} : \|\mathbf{y} - \mathbf{y}^*\| = \varepsilon \} (the sphere of radius ε\varepsilon). Since VV is continuous and positive on SεS_\varepsilon, it attains a minimum: c=minySεV(y)>0c = \min_{\mathbf{y} \in S_\varepsilon} V(\mathbf{y}) > 0

Step 2: Choose δ\delta. Since V(y)=0V(\mathbf{y}^*) = 0 and VV is continuous, there exists δ>0\delta > 0 (with δ<ε\delta < \varepsilon) such that yy<δ\|\mathbf{y} - \mathbf{y}^*\| < \delta implies V(y)<cV(\mathbf{y}) < c.

Step 3: Trajectories stay inside. Suppose y(0)y<δ\|\mathbf{y}(0) - \mathbf{y}^*\| < \delta, so V(y(0))<cV(\mathbf{y}(0)) < c. Since V˙0\dot{V} \leq 0 along trajectories: V(y(t))V(y(0))<cfor all t0V(\mathbf{y}(t)) \leq V(\mathbf{y}(0)) < c \quad \text{for all } t \geq 0

If the trajectory were to reach SεS_\varepsilon, we would have V(y(t))cV(\mathbf{y}(t)) \geq c — a contradiction. Therefore y(t)y<ε\|\mathbf{y}(t) - \mathbf{y}^*\| < \varepsilon for all t0t \geq 0. \blacksquare

🔷 Theorem 3 (LaSalle's Invariance Principle)

Suppose VV is a Lyapunov function with V˙0\dot{V} \leq 0 on a compact positively invariant set Ω\Omega. Let E={yΩ:V˙(y)=0}E = \{ \mathbf{y} \in \Omega : \dot{V}(\mathbf{y}) = 0 \}, and let MM be the largest invariant set contained in EE. Then every trajectory starting in Ω\Omega converges to MM as tt \to \infty.

In particular, if the only invariant subset of EE is {y}\{\mathbf{y}^*\}, then y\mathbf{y}^* is asymptotically stable — even though V˙\dot{V} may vanish on a larger set.

📝 Example 5 (Damped Pendulum Energy as a Lyapunov Function)

For the damped pendulum θ˙=ω\dot{\theta} = \omega, ω˙=sinθbω\dot{\omega} = -\sin\theta - b\omega with b>0b > 0, the total energy is: V(θ,ω)=12ω2+(1cosθ)V(\theta, \omega) = \frac{1}{2}\omega^2 + (1 - \cos\theta)

This is positive-definite near (θ,ω)=(0,0)(\theta, \omega) = (0, 0): V(0,0)=0V(0, 0) = 0 and V>0V > 0 for (θ,ω)(0,0)(\theta, \omega) \neq (0, 0) when θ<π|\theta| < \pi.

The orbital derivative is: V˙=ω(sinθbω)+sinθω=bω20\dot{V} = \omega \cdot (-\sin\theta - b\omega) + \sin\theta \cdot \omega = -b\omega^2 \leq 0

So VV is a Lyapunov function and the origin is stable. But V˙=0\dot{V} = 0 on the set E={ω=0}E = \{\omega = 0\} — not just at the origin. Is it asymptotically stable?

LaSalle’s principle to the rescue: The largest invariant set in E={ω=0}E = \{\omega = 0\} is just {(0,0)}\{(0, 0)\}, because if ω(t)=0\omega(t) = 0 for all tt then ω˙(t)=0\dot{\omega}(t) = 0, forcing sinθ(t)=0\sin\theta(t) = 0, hence θ(t)=0\theta(t) = 0 (near the origin). By LaSalle’s principle, (0,0)(0, 0) is asymptotically stable.

📝 Example 6 (Computing the Orbital Derivative for a Quadratic Lyapunov Function)

Consider the system y˙1=y1+y2\dot{y}_1 = -y_1 + y_2, y˙2=y1y2\dot{y}_2 = -y_1 - y_2 with candidate V(y1,y2)=y12+y22V(y_1, y_2) = y_1^2 + y_2^2.

The gradient is V=(2y1,2y2)\nabla V = (2y_1, 2y_2), and the orbital derivative is: V˙=2y1(y1+y2)+2y2(y1y2)=2y12+2y1y22y1y22y22=2(y12+y22)\dot{V} = 2y_1(-y_1 + y_2) + 2y_2(-y_1 - y_2) = -2y_1^2 + 2y_1 y_2 - 2y_1 y_2 - 2y_2^2 = -2(y_1^2 + y_2^2)

Since V˙=2y2<0\dot{V} = -2\|\mathbf{y}\|^2 < 0 for y0\mathbf{y} \neq \mathbf{0}, VV is a strict Lyapunov function and the origin is asymptotically stable. The decay rate is V˙=2V\dot{V} = -2V, so V(t)=V(0)e2tV(t) = V(0)e^{-2t} — exponential convergence.

💡 Remark 3 (Finding Lyapunov Functions is an Art)

There is no systematic method for constructing Lyapunov functions for general nonlinear systems. Common strategies include:

  • Physical energy (kinetic + potential) for mechanical systems
  • Quadratic forms V=yTPyV = \mathbf{y}^T P \mathbf{y} for systems near a linear regime (see Section 5)
  • Sum-of-squares (SOS) programming — a computational approach using semidefinite optimization
  • Trial and error guided by the structure of f\mathbf{f}

The art is in finding a VV whose level curves “match” the dynamics well enough that V˙\dot{V} comes out negative-definite. This is one place where understanding the physics or geometry of a system genuinely helps.

Lyapunov function as a 3D surface (left) with a trajectory spiraling down toward the minimum, and V(t) decreasing monotonically along the trajectory (right).

Level curves of V (click to launch trajectory)
A stable spiral with V = y₁² + y₂². The orbital derivative V̇ = -2y₁² - 2y₂² < 0 confirms asymptotic stability.

5. The Lyapunov Equation & Quadratic Lyapunov Functions

For linear systems y˙=Ay\dot{\mathbf{y}} = A\mathbf{y}, Lyapunov functions can be constructed systematically. The key is the quadratic form V(y)=yTPyV(\mathbf{y}) = \mathbf{y}^T P \mathbf{y}, where PP is a symmetric positive-definite matrix.

The orbital derivative is: V˙=y˙TPy+yTPy˙=yTATPy+yTPAy=yT(ATP+PA)y\dot{V} = \dot{\mathbf{y}}^T P \mathbf{y} + \mathbf{y}^T P \dot{\mathbf{y}} = \mathbf{y}^T A^T P \mathbf{y} + \mathbf{y}^T P A \mathbf{y} = \mathbf{y}^T (A^T P + PA) \mathbf{y}

If we can choose PP so that ATP+PA=QA^T P + PA = -Q for some positive-definite QQ, then V˙=yTQy<0\dot{V} = -\mathbf{y}^T Q \mathbf{y} < 0 for y0\mathbf{y} \neq \mathbf{0} — a strict Lyapunov function.

📐 Definition 6 (Hurwitz Matrix)

A matrix AA is Hurwitz (or stable) if every eigenvalue has strictly negative real part: Re(λi(A))<0\operatorname{Re}(\lambda_i(A)) < 0 for all ii. Equivalently, AA is Hurwitz if and only if eAt0e^{At} \to 0 as tt \to \infty.

🔷 Theorem 4 (The Lyapunov Equation)

Let AA be a real n×nn \times n matrix and QQ a symmetric positive-definite matrix. The matrix equation ATP+PA=Q(the Lyapunov equation)A^T P + PA = -Q \qquad \text{(the Lyapunov equation)} has a unique symmetric positive-definite solution PP if and only if AA is Hurwitz.

When AA is Hurwitz, V(y)=yTPyV(\mathbf{y}) = \mathbf{y}^T P \mathbf{y} is a strict Lyapunov function for y˙=Ay\dot{\mathbf{y}} = A\mathbf{y}, confirming asymptotic stability.

📝 Example 7 (Solving the Lyapunov Equation for a 2×2 Hurwitz Matrix)

Let A=(1232)A = \begin{pmatrix} -1 & 2 \\ -3 & -2 \end{pmatrix}. The eigenvalues are λ1.5±2.18i\lambda \approx -1.5 \pm 2.18i — both have negative real part, so AA is Hurwitz.

Choose Q=IQ = I (the simplest positive-definite choice). The Lyapunov equation ATP+PA=IA^T P + PA = -I is a linear system in the entries of P=(p11p12p12p22)P = \begin{pmatrix} p_{11} & p_{12} \\ p_{12} & p_{22} \end{pmatrix}: 2p116p12=1,2p113p123p22=0,4p124p22=1-2p_{11} - 6p_{12} = -1, \qquad 2p_{11} - 3p_{12} - 3p_{22} = 0, \qquad 4p_{12} - 4p_{22} = -1

Solving: P(1.500.100.100.65)P \approx \begin{pmatrix} 1.50 & 0.10 \\ 0.10 & 0.65 \end{pmatrix}. The eigenvalues of PP are approximately 1.521.52 and 0.630.63, both positive — confirming P0P \succ 0.

The Lyapunov function V(y)=1.50y12+0.20y1y2+0.65y22V(\mathbf{y}) = 1.50\,y_1^2 + 0.20\,y_1 y_2 + 0.65\,y_2^2 has elliptical level curves, and V˙=(y12+y22)<0\dot{V} = -(y_1^2 + y_2^2) < 0.

📝 Example 8 (Quadratic Lyapunov Function for Training Dynamics)

Gradient descent on the quadratic loss L(θ)=12θTHθL(\theta) = \frac{1}{2}\theta^T H \theta yields the linear system θ˙=Hθ\dot{\theta} = -H\theta. If HH is positive-definite (a local minimum), then A=HA = -H is Hurwitz.

The loss function itself is a Lyapunov function: V(θ)=L(θ)=12θTHθV(\theta) = L(\theta) = \frac{1}{2}\theta^T H \theta with V˙=θTH2θ<0\dot{V} = -\theta^T H^2 \theta < 0. This proves gradient descent converges — the loss decreases monotonically along the gradient flow.

More precisely, P=12HP = \frac{1}{2}H solves ATP+PA=H2A^T P + PA = -H^2 (the Lyapunov equation with Q=H2Q = H^2). The condition number κ(H)=λmax/λmin\kappa(H) = \lambda_{\max}/\lambda_{\min} controls the eccentricity of the level ellipses — ill-conditioned Hessians produce elongated ellipses where gradient descent zig-zags.

💡 Remark 4 (The Lyapunov Equation as a Linear System)

The Lyapunov equation ATP+PA=QA^T P + PA = -Q can be rewritten using the Kronecker product: (IAT+ATI)vec(P)=vec(Q)(I \otimes A^T + A^T \otimes I) \operatorname{vec}(P) = -\operatorname{vec}(Q)

This is a linear system of size n2×n2n^2 \times n^2, solvable by standard methods. In Python: scipy.linalg.solve_continuous_lyapunov(A.T, -Q). The condition for unique solvability is that AA and AT-A^T share no eigenvalues — which holds precisely when AA is Hurwitz (since then all eigenvalues of AA have negative real parts, while eigenvalues of AT-A^T have positive real parts).

Level curves of the quadratic Lyapunov function V = yᵀPy (left) with trajectories confined inside shrinking ellipses. Eigenvalues of P and A shown side by side (right).

6. Invariant Manifolds

At a saddle-type equilibrium, some directions attract trajectories while others repel them. The invariant manifolds organize these directions into global geometric structures.

📐 Definition 7 (Stable and Unstable Manifolds)

Let y\mathbf{y}^* be a hyperbolic equilibrium of y˙=f(y)\dot{\mathbf{y}} = \mathbf{f}(\mathbf{y}).

  • The stable manifold is Ws(y)={y0:y(t;y0)y as t+}W^s(\mathbf{y}^*) = \{ \mathbf{y}_0 : \mathbf{y}(t; \mathbf{y}_0) \to \mathbf{y}^* \text{ as } t \to +\infty \} — the set of initial conditions that converge to y\mathbf{y}^* in forward time.

  • The unstable manifold is Wu(y)={y0:y(t;y0)y as t}W^u(\mathbf{y}^*) = \{ \mathbf{y}_0 : \mathbf{y}(t; \mathbf{y}_0) \to \mathbf{y}^* \text{ as } t \to -\infty \} — the set of initial conditions that converge to y\mathbf{y}^* in backward time (equivalently, trajectories that depart from y\mathbf{y}^* in forward time).

🔷 Theorem 5 (Stable Manifold Theorem)

Let y\mathbf{y}^* be a hyperbolic equilibrium of y˙=f(y)\dot{\mathbf{y}} = \mathbf{f}(\mathbf{y}) with fC1\mathbf{f} \in C^1. Let J=Jf(y)J = J_{\mathbf{f}}(\mathbf{y}^*) have kk eigenvalues with negative real part and nkn - k eigenvalues with positive real part. Then:

  1. Ws(y)W^s(\mathbf{y}^*) is a smooth manifold of dimension kk, tangent to the stable eigenspace EsE^s of JJ at y\mathbf{y}^*.
  2. Wu(y)W^u(\mathbf{y}^*) is a smooth manifold of dimension nkn - k, tangent to the unstable eigenspace EuE^u of JJ at y\mathbf{y}^*.

📝 Example 9 (Stable and Unstable Manifolds of a Lotka-Volterra Saddle)

At the origin of the Lotka-Volterra system, the Jacobian has eigenvalues λ1=α>0\lambda_1 = \alpha > 0 (unstable, prey direction) and λ2=γ<0\lambda_2 = -\gamma < 0 (stable, predator direction).

The stable manifold Ws(0,0)W^s(0,0) is the positive yy-axis: {(0,y):y>0}\{(0, y) : y > 0\}. Predators with no prey die exponentially — trajectories along this axis converge to the origin.

The unstable manifold Wu(0,0)W^u(0,0) is the positive xx-axis: {(x,0):x>0}\{(x, 0) : x > 0\}. Prey with no predators grow exponentially — trajectories along this axis depart from the origin.

Both manifolds are straight lines — they coincide with the eigenspaces of the Jacobian. For nonlinear systems in general, the manifolds are curves (not lines) that are tangent to the eigenspaces at the equilibrium but curve away from them further out.

💡 Remark 5 (Center Manifolds and Dimensional Reduction)

When the Jacobian has eigenvalues on the imaginary axis (Re(λ)=0\operatorname{Re}(\lambda) = 0), the center manifold WcW^c captures the “slow” dynamics that linearization cannot resolve. The center manifold theorem states that WcW^c exists, is tangent to the center eigenspace EcE^c at the equilibrium, and is locally invariant. The dynamics restricted to WcW^c determine the stability of the full system — this is the principle of dimensional reduction: reduce the analysis to the center manifold, where the essential nonlinearity lives.

Stable and unstable manifolds at a saddle equilibrium. The manifolds curve away from the linear eigenspaces but remain tangent at the equilibrium (zoom on right).

7. Limit Cycles & the Poincaré-Bendixson Theorem

Linear systems can have periodic orbits — the centers of Topic 22, where trajectories form closed ellipses. But these orbits are not isolated: every nearby initial condition also produces a periodic orbit. Limit cycles are a fundamentally nonlinear phenomenon: isolated periodic orbits that attract (or repel) neighboring trajectories.

📐 Definition 8 (Limit Cycle)

A limit cycle is an isolated closed orbit Γ\Gamma of y˙=f(y)\dot{\mathbf{y}} = \mathbf{f}(\mathbf{y}).

  • Stable limit cycle: Nearby trajectories spiral toward Γ\Gamma from both inside and outside. Γ\Gamma is an attractor.
  • Unstable limit cycle: Nearby trajectories spiral away from Γ\Gamma.
  • Semi-stable limit cycle: Trajectories approach from one side and recede on the other.

🔷 Theorem 6 (Poincaré-Bendixson Theorem)

Let y˙=f(y)\dot{\mathbf{y}} = \mathbf{f}(\mathbf{y}) be a C1C^1 planar system. If a trajectory remains in a bounded region RR2R \subset \mathbb{R}^2 for all t0t \geq 0, and RR contains no equilibria, then the ω\omega-limit set of the trajectory is a periodic orbit.

More generally: if the ω\omega-limit set is nonempty, compact, and contains no equilibria, it is a periodic orbit.

Proof.

The full proof is long and uses delicate topological arguments about planar flows. We outline the key ideas.

Step 1: ω\omega-limit set structure. The ω\omega-limit set ω(y0)=T0{y(t):tT}\omega(\mathbf{y}_0) = \bigcap_{T \geq 0} \overline{\{ \mathbf{y}(t) : t \geq T \}} is nonempty (by boundedness), compact, connected, and invariant under the flow.

Step 2: No equilibria in ω\omega. By hypothesis, ω\omega contains no equilibria. Therefore every point of ω\omega lies on a regular orbit arc.

Step 3: Flow box and transversal. At any point pωp \in \omega, the flow is locally a parallel flow (by the Flow Box Theorem). Take a small transversal segment Σ\Sigma through pp. The trajectory must cross Σ\Sigma repeatedly (since pp is an ω\omega-limit point).

Step 4: Monotonicity on Σ\Sigma. The Jordan Curve Theorem constrains how the trajectory can cross Σ\Sigma: successive crossings must be monotone (each crossing is “closer” to pp than the last). This is where 2D topology is essential — and why the theorem fails in 3D.

Step 5: Convergence. The monotone sequence of crossings converges to a fixed point on Σ\Sigma, which lies on a periodic orbit in ω\omega. Since ω\omega is connected and contains no equilibria, the entire ω\omega-limit set equals this periodic orbit. \blacksquare

📝 Example 10 (Van der Pol Limit Cycle — Trapping Region)

The Van der Pol oscillator x¨μ(1x2)x˙+x=0\ddot{x} - \mu(1 - x^2)\dot{x} + x = 0 (with μ>0\mu > 0) becomes: x˙=y,y˙=μ(1x2)yx\dot{x} = y, \qquad \dot{y} = \mu(1 - x^2)y - x

Linearization at origin: The Jacobian has eigenvalues λ=μ2±12μ24\lambda = \frac{\mu}{2} \pm \frac{1}{2}\sqrt{\mu^2 - 4}. For μ>0\mu > 0, Re(λ)>0\operatorname{Re}(\lambda) > 0 — the origin is an unstable spiral (or unstable node for μ2\mu \geq 2).

Trapping region: Construct a bounded, positively invariant annular region RR surrounding the origin that contains no equilibria (the only equilibrium is the origin, which is excluded). By the Poincaré-Bendixson theorem, RR must contain a periodic orbit — the Van der Pol limit cycle.

For μ\mu small, the limit cycle is nearly circular with radius 2\approx 2. As μ\mu increases, it develops sharp corners — the characteristic relaxation oscillation shape.

💡 Remark 6 (Poincaré-Bendixson is Special to 2D)

The Poincaré-Bendixson theorem critically uses 2D topology — the Jordan Curve Theorem, which has no analog in higher dimensions. In 3D, bounded trajectories that avoid equilibria need not converge to periodic orbits. Instead, they can exhibit chaotic behavior: the Lorenz attractor (x˙=σ(yx)\dot{x} = \sigma(y - x), y˙=x(ρz)y\dot{y} = x(\rho - z) - y, z˙=xyβz\dot{z} = xy - \beta z) is bounded, has no periodic orbits, and is a strange attractor with sensitive dependence on initial conditions. Chaos is impossible in 2D autonomous systems — a fundamental constraint of planar topology.

Van der Pol oscillator: phase portrait with the stable limit cycle (left) and the time series x(t) showing convergence to periodic oscillation from both inside and outside the limit cycle (right).

8. Bifurcation Theory

So far we’ve asked: given a system, what is the long-term behavior? Bifurcation theory asks a deeper question: how does the behavior change as a parameter varies? A bifurcation occurs when the qualitative structure of the phase portrait changes — equilibria appear or disappear, or exchange stability.

📐 Definition 9 (Bifurcation and Bifurcation Point)

A bifurcation of y˙=f(y;μ)\dot{\mathbf{y}} = \mathbf{f}(\mathbf{y}; \mu) occurs at a parameter value μ=μ\mu = \mu^* if the phase portrait for μ\mu near μ\mu^* is not qualitatively equivalent to the phase portrait at μ\mu^*. The value μ\mu^* is a bifurcation point.

Typical signatures: the number of equilibria changes, or an equilibrium changes stability (eigenvalues cross the imaginary axis).

📐 Definition 10 (Saddle-Node Bifurcation (Normal Form))

The saddle-node bifurcation has normal form x˙=μx2\dot{x} = \mu - x^2 (in 1D) or x˙1=μx12\dot{x}_1 = \mu - x_1^2, x˙2=x2\dot{x}_2 = -x_2 (in 2D).

  • For μ>0\mu > 0: two equilibria at x=±μx = \pm\sqrt{\mu} — one stable node, one saddle.
  • At μ=0\mu = 0: the equilibria merge into a single half-stable equilibrium at x=0x = 0.
  • For μ<0\mu < 0: no equilibria — all trajectories escape.

Two equilibria collide, annihilate, and disappear as μ\mu decreases through zero.

📐 Definition 11 (Hopf Bifurcation (Supercritical and Subcritical))

A Hopf bifurcation occurs when a pair of complex conjugate eigenvalues λ(μ)=α(μ)±iβ(μ)\lambda(\mu) = \alpha(\mu) \pm i\beta(\mu) of the Jacobian crosses the imaginary axis: α(μ)=0\alpha(\mu^*) = 0 with α(μ)0\alpha'(\mu^*) \neq 0 (transversality).

  • Supercritical Hopf: For μ<μ\mu < \mu^*, a stable equilibrium. At μ=μ\mu = \mu^*, the equilibrium loses stability and a stable limit cycle is born. The limit cycle grows as μμ\sqrt{\mu - \mu^*}.
  • Subcritical Hopf: For μ<μ\mu < \mu^*, a stable equilibrium coexists with an unstable limit cycle. At μ=μ\mu = \mu^*, the limit cycle shrinks to zero and the equilibrium becomes unstable — often accompanied by a sudden jump to a distant attractor.

🔷 Theorem 7 (Hopf Bifurcation Theorem)

Let y˙=f(y;μ)\dot{\mathbf{y}} = \mathbf{f}(\mathbf{y}; \mu) have an equilibrium y(μ)\mathbf{y}^*(\mu) with Jacobian eigenvalues λ(μ)=α(μ)±iβ(μ)\lambda(\mu) = \alpha(\mu) \pm i\beta(\mu). Suppose:

  1. α(μ)=0\alpha(\mu^*) = 0 and β(μ)0\beta(\mu^*) \neq 0 (eigenvalues on imaginary axis)
  2. α(μ)0\alpha'(\mu^*) \neq 0 (transversal crossing)

Then a branch of periodic orbits bifurcates from y\mathbf{y}^* at μ=μ\mu = \mu^*. The period is approximately 2π/β(μ)2\pi / \beta(\mu^*) and the amplitude scales as O(μμ)O(\sqrt{|\mu - \mu^*|}).

📝 Example 11 (Saddle-Node Bifurcation — Complete Analysis)

Consider x˙=μx2\dot{x} = \mu - x^2 on R\mathbb{R}.

Equilibria: x=±μx^* = \pm\sqrt{\mu} (exist only for μ0\mu \geq 0).

Stability: f(x)=2xf'(x) = -2x, so f(μ)=2μ<0f'(\sqrt{\mu}) = -2\sqrt{\mu} < 0 (stable) and f(μ)=2μ>0f'(-\sqrt{\mu}) = 2\sqrt{\mu} > 0 (unstable).

Bifurcation diagram: Plot xx^* vs. μ\mu. The curve x=μx = \sqrt{\mu} (upper branch, stable, solid line) and x=μx = -\sqrt{\mu} (lower branch, unstable, dashed line) meet at the origin (μ,x)=(0,0)(\mu, x) = (0, 0) — the bifurcation point. The branches form a parabola opening to the right.

At the bifurcation (μ=0\mu = 0): The linearization is f(0)=0f'(0) = 0 — a non-hyperbolic equilibrium. The system slows down critically: trajectories approach x=0x = 0 algebraically (x(t)t1x(t) \sim t^{-1}) rather than exponentially.

📝 Example 12 (Supercritical Hopf — Birth of a Limit Cycle)

The Hopf normal form: x˙1=μx1x2x1(x12+x22),x˙2=x1+μx2x2(x12+x22)\dot{x}_1 = \mu x_1 - x_2 - x_1(x_1^2 + x_2^2), \qquad \dot{x}_2 = x_1 + \mu x_2 - x_2(x_1^2 + x_2^2)

In polar coordinates (r,θ)(r, \theta): r˙=μrr3\dot{r} = \mu r - r^3, θ˙=1\dot{\theta} = 1.

Equilibria of the radial equation: r=0r = 0 (always) and r=μr = \sqrt{\mu} (for μ>0\mu > 0).

  • μ<0\mu < 0: Only r=0r = 0 exists. Since r˙=r(μr2)<0\dot{r} = r(\mu - r^2) < 0 for r>0r > 0, the origin is asymptotically stable (stable spiral).
  • μ=0\mu = 0: The origin is non-hyperbolic. The cubic term stabilizes it weakly.
  • μ>0\mu > 0: r=0r = 0 is unstable, r=μr = \sqrt{\mu} is a stable limit cycle. Every trajectory (except the origin itself) spirals toward the circle of radius μ\sqrt{\mu}.

The limit cycle amplitude grows as μ\sqrt{\mu} — a square-root scaling that is universal for supercritical Hopf bifurcations.

💡 Remark 7 (Transcritical and Pitchfork Bifurcations)

Two other common bifurcation types are:

  • Transcritical: x˙=μxx2\dot{x} = \mu x - x^2. Two equilibria (x=0x = 0 and x=μx = \mu) exist for all μ\mu but exchange stability at μ=0\mu = 0.
  • Pitchfork: x˙=μxx3\dot{x} = \mu x - x^3 (supercritical). For μ<0\mu < 0: one stable equilibrium at x=0x = 0. For μ>0\mu > 0: x=0x = 0 becomes unstable and two symmetric stable equilibria x=±μx = \pm\sqrt{\mu} appear. Common in systems with symmetry.

Both are present in the bifurcation taxonomy but are not covered in full here. The saddle-node and Hopf bifurcations are the generic ones — they occur without symmetry and are the most important for applications.

Bifurcation diagrams. Left: saddle-node (two equilibria merge and disappear). Center: Hopf (equilibrium loses stability and a limit cycle is born). Right: phase portraits at three values of μ for the Hopf system.

Bifurcation diagram
Phase portrait at μ = 0.50 (click to launch)
Canonical saddle-node bifurcation. For μ > 0: stable node at x = √μ, saddle at x = −√μ. At μ = 0: half-stable equilibrium. For μ < 0: no equilibria.

We now bring together all the tools — nullclines, linearization, Lyapunov functions, invariant manifolds, limit cycles — in a gallery of canonical nonlinear 2D systems.

📝 Example 13 (Lotka-Volterra — Full Analysis)

The predator-prey system x˙=αxβxy\dot{x} = \alpha x - \beta xy, y˙=δxyγy\dot{y} = \delta xy - \gamma y with standard parameters α=1,β=0.5,δ=0.25,γ=0.5\alpha = 1, \beta = 0.5, \delta = 0.25, \gamma = 0.5:

  • Equilibria: (0,0)(0, 0) (saddle) and (2,2)(2, 2) (center in linearization).
  • Nullclines: x=0x = 0 or y=2y = 2 (x-nullcline); y=0y = 0 or x=2x = 2 (y-nullcline).
  • Conserved quantity: H(x,y)=δxγlnx+βyαlnyH(x, y) = \delta x - \gamma \ln x + \beta y - \alpha \ln y is constant along trajectories. The orbits are closed curves — genuine periodic oscillations of predator and prey populations.
  • Key feature: The nonzero equilibrium is a center, not merely in the linearization but in the full nonlinear system. This is exceptional — it relies on the special structure (Hamiltonian system). Generic perturbations break the closed orbits into spirals.

📝 Example 14 (Damped Pendulum — Separatrices)

The damped pendulum θ˙=ω\dot{\theta} = \omega, ω˙=sinθbω\dot{\omega} = -\sin\theta - b\omega with b=0.3b = 0.3:

  • Equilibria: (θ,ω)=(nπ,0)(\theta, \omega) = (n\pi, 0). Even multiples of π\pi are stable spirals (downward rest positions); odd multiples are saddle points (upward balance points).
  • Separatrices: The stable and unstable manifolds of the saddle points form heteroclinic connections — curves connecting one saddle to the next. These separatrices divide the phase plane into basins of attraction for the different stable equilibria.
  • Physical interpretation: A trajectory starting with just enough energy to pass over the top of the pendulum follows a separatrix. Slightly less energy and it oscillates back; slightly more and it whips over the top and settles into the next well.

📝 Example 15 (SIR Epidemic Model)

The SIR model S˙=βSI\dot{S} = -\beta SI, I˙=βSIγI\dot{I} = \beta SI - \gamma I models an epidemic:

  • Equilibria: The line I=0I = 0 consists entirely of equilibria (disease-free states). The II-nullcline S=γ/βS = \gamma/\beta divides the positive quadrant.
  • Basic reproduction number: R0=βS0/γR_0 = \beta S_0 / \gamma where S0S_0 is the initial susceptible fraction. If R0>1R_0 > 1, the infected population initially grows (epidemic). If R0<1R_0 < 1, it decays (no epidemic).
  • Threshold behavior: This is a transcritical-type bifurcation in disguise. As SS decreases below γ/β\gamma/\beta during the epidemic, I˙\dot{I} changes sign — the epidemic peaks and begins to decline. The parameter R0R_0 plays the role of the bifurcation parameter.

💡 Remark 8 (Gradient Systems — Always Stable, No Limit Cycles)

A gradient system y˙=V(y)\dot{\mathbf{y}} = -\nabla V(\mathbf{y}) uses VV itself as a Lyapunov function: V˙=V20\dot{V} = -\|\nabla V\|^2 \leq 0. Consequences:

  1. Every trajectory converges to an equilibrium (no periodic orbits possible).
  2. Every local minimum of VV is an asymptotically stable equilibrium.
  3. Saddle points of VV have unstable manifolds of dimension equal to the Morse index (number of negative eigenvalues of 2V\nabla^2 V).

Gradient descent in ML is precisely a gradient system — which is why understanding loss landscape critical points is equivalent to stability analysis.

Phase portrait gallery: Lotka-Volterra (top-left), Van der Pol (top-right), damped pendulum (bottom-left), SIR model (bottom-right). Each panel shows nullclines (dashed), equilibria, and representative trajectories.

10. Computational Notes

Stability analysis in practice relies on numerical tools. Here are the key computational techniques.

Computing Jacobians. For an analytically given f\mathbf{f}, symbolic differentiation is exact. For black-box systems (e.g., neural ODEs), use central finite differences: fiyj(y)fi(y+hej)fi(yhej)2h\frac{\partial f_i}{\partial y_j}(\mathbf{y}^*) \approx \frac{f_i(\mathbf{y}^* + h\mathbf{e}_j) - f_i(\mathbf{y}^* - h\mathbf{e}_j)}{2h} with h106h \approx 10^{-6} (balancing truncation and roundoff error).

Solving the Lyapunov equation. In Python:

import numpy as np
from scipy.linalg import solve_continuous_lyapunov

A = np.array([[-1, 2], [-3, -2]])
Q = np.eye(2)
P = solve_continuous_lyapunov(A.T, -Q)  # Solves A^T P + P A = -Q
print("P =", P)
print("Eigenvalues of P:", np.linalg.eigvalsh(P))  # Should be positive

Continuation methods. Bifurcation diagrams are computed by continuation: start at a known equilibrium, then incrementally change μ\mu while tracking the equilibrium position using Newton’s method. When the Jacobian becomes singular (a saddle-node) or eigenvalues cross the imaginary axis (a Hopf), the continuation algorithm detects the bifurcation. Software: AUTO, PyDSTool, MATCONT.

Lyapunov function construction via SDP. For polynomial systems, the search for a polynomial Lyapunov function can be cast as a semidefinite program (SDP): find P0P \succeq 0 such that V(y)=p(y)TPp(y)V(\mathbf{y}) = \mathbf{p}(\mathbf{y})^T P \,\mathbf{p}(\mathbf{y}) satisfies V˙0\dot{V} \leq 0, where p(y)\mathbf{p}(\mathbf{y}) is a vector of monomials. Tools like SOSTOOLS (MATLAB) and DSOS/SDSOS (Julia) automate this.

11. Connections to ML — Stability of Learning

The stability theory developed in this topic is the mathematical language of optimization dynamics. Here we make the connections explicit.

Loss landscape stability. A critical point θ\theta^* of L(θ)L(\theta) (where L(θ)=0\nabla L(\theta^*) = 0) is the equilibrium of the gradient flow θ˙=L(θ)\dot{\theta} = -\nabla L(\theta). The Jacobian of the RHS at θ\theta^* is 2L(θ)-\nabla^2 L(\theta^*), so:

  • Local minimum (2L0\nabla^2 L \succ 0): asymptotically stable. Gradient descent converges.
  • Saddle point (2L\nabla^2 L indefinite): unstable. The unstable manifold directions are the eigenvectors of 2L\nabla^2 L with negative eigenvalues — these are the escape routes.
  • Local maximum (2L0\nabla^2 L \prec 0): unstable in all directions.

In high dimensions (thousands of parameters), saddle points vastly outnumber local minima. The empirical observation that SGD reliably finds good minima is explained by the noise-driven escape from saddle points along the unstable manifold.

📝 Example 16 (Gradient Flow on a Loss Landscape with a Saddle Point)

Consider L(x,y)=12(x2y2)L(x, y) = \frac{1}{2}(x^2 - y^2) with critical point at the origin.

The gradient flow is x˙=x\dot{x} = -x, y˙=y\dot{y} = y. The Hessian is 2L=(1001)\nabla^2 L = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix} — indefinite.

  • Stable manifold WsW^s: the xx-axis. Starting on WsW^s, the xx-component decays and yy stays at zero. The iterate approaches the saddle.
  • Unstable manifold WuW^u: the yy-axis. Starting on WuW^u, the yy-component grows exponentially. The iterate escapes the saddle.

In practice, SGD adds noise that has a component along WuW^u, causing the iterate to escape the saddle point — this is the mechanism behind saddle-point escape in deep learning optimization.

GAN training dynamics. A GAN trains a generator GG and discriminator DD simultaneously. In simplified models, the training dynamics form a 2D system: θ˙G=θGV(θG,θD),θ˙D=θDV(θG,θD)\dot{\theta}_G = -\nabla_{\theta_G} V(\theta_G, \theta_D), \qquad \dot{\theta}_D = \nabla_{\theta_D} V(\theta_G, \theta_D)

This is not a gradient system — it’s a saddle-point optimization (min-max). The Jacobian at equilibrium has the form J=(ABBTC)J = \begin{pmatrix} -A & -B \\ B^T & -C \end{pmatrix}, and its eigenvalues can be complex, leading to oscillatory behavior. Mode collapse corresponds to a bifurcation: as the generator becomes too confident, the equilibrium loses stability (a Hopf-like bifurcation) and training oscillates.

📝 Example 17 (Simplified GAN Dynamics)

Consider the simplified GAN: x˙=xy\dot{x} = -xy, y˙=x21\dot{y} = x^2 - 1 where xx represents the generator and yy the discriminator.

The equilibrium at (1,0)(1, 0) has Jacobian J=(0120)J = \begin{pmatrix} 0 & -1 \\ 2 & 0 \end{pmatrix} with eigenvalues λ=±i2\lambda = \pm i\sqrt{2} — a center. The linearization predicts perpetual oscillation, and indeed the nonlinear system has a conserved quantity H(x,y)=12y2+x2ln(x2)H(x, y) = \frac{1}{2}y^2 + x^2 - \ln(x^2).

Adding regularization (spectral normalization of DD, gradient penalties) modifies the Jacobian to have Re(λ)<0\operatorname{Re}(\lambda) < 0 — turning the center into a stable spiral and stabilizing training.

💡 Remark 9 (Learning Rate Schedules and Stability)

Stability theory explains why learning rate schedules work. Discrete gradient descent θk+1=θkηL(θk)\theta_{k+1} = \theta_k - \eta \nabla L(\theta_k) is a discretization of the gradient flow with step size η\eta. The discrete system is stable (converges) only if η<2/λmax(2L)\eta < 2/\lambda_{\max}(\nabla^2 L) — the CFL-like condition for gradient descent.

A constant learning rate must satisfy η<2/λmax\eta < 2/\lambda_{\max} everywhere along the trajectory. A learning rate schedule that decreases η\eta over time allows initially large steps (fast exploration, saddle escape via effective noise) that later shrink to ensure convergence (stability near the minimum). The schedule effectively transitions from the unstable regime (escape saddles) to the stable regime (converge to minimum).

ML connections: gradient flow on a loss landscape with saddle point and manifolds (top-left), Hessian eigenvalue spectrum at saddle vs. minimum (top-right), GAN training dynamics as a 2D phase portrait (bottom-left), neural ODE stability (bottom-right).

Loss contours (click to start gradient flow)
Hessian eigenvalues: λ₁ = 2, λ₂ = 0.5Positive-definite (stable)
Positive-definite Hessian: both eigenvalues > 0 (λ₁ = 2, λ₂ = 0.5). Gradient descent converges — this is an asymptotically stable equilibrium of the gradient flow.

12. Connections & Further Reading

This topic is the theoretical peak of the ODE track — the point where linear algebra (eigenvalues, matrix exponential) becomes predictive for nonlinear systems through linearization, and where energy methods (Lyapunov functions) provide an alternative path that doesn’t require solving the ODE.

Within formalCalculus:

Forward to formalml.com:

  • Gradient Descent — convergence analysis is stability of the gradient flow ODE
  • Information Geometry — natural gradient stability via the Fisher-preconditioned Hessian
  • Smooth Manifolds — stable/unstable manifolds as immersed submanifolds; Poincaré-Bendixson on compact 2-manifolds
  • Spectral Theorem — Lyapunov equation solvability via spectral conditions on AA and AT-A^T

References

  1. book Strogatz (2015). Nonlinear Dynamics and Chaos Chapters 5–8: stability, phase portraits, bifurcations, limit cycles. The primary reference for exposition style.
  2. book Perko (2001). Differential Equations and Dynamical Systems Chapters 2–3: linearization, Lyapunov stability, stable manifold theorem. More rigorous proofs.
  3. book Teschl (2012). Ordinary Differential Equations Chapters 6–7: stability theory, Poincaré-Bendixson theorem.
  4. paper Dauphin, Pascanu, Gulcehre, Cho, Ganguli & Bengio (2014). “Identifying and attacking the saddle point problem in high-dimensional non-convex optimization” Connects Hessian eigenvalue analysis to saddle point escape in deep learning — the stability theory of loss landscapes.