Linear Multistep Methods for Discovery (LMM-DMD)

Updated 15 February 2026

LMM-DMD is a data-driven framework that uses classical linear multistep methods and neural approximators to recover unknown vector fields in ODE models from sampled state data.
It rigorously decomposes the overall error into discretization and network approximation components, with convergence guarantees based on consistency and stability criteria.
The method has been validated on a range of systems—from linear oscillators to chaotic and high-dimensional biochemical networks—demonstrating broad applicability.

Linear Multistep Methods for Discovery (LMM-DMD) are a class of algorithms for data-driven inference of continuous-time dynamical systems from observed state data. LMM-DMD leverages classical linear multistep methods (LMMs) for discretizing time derivatives, combining them with expressive neural approximators—such as feedforward neural networks or Kolmogorov–Arnold networks (KANs)—to recover the vector fields governing unknown ordinary differential equations (ODEs). Rigorous error analyses underpin the approach, decomposing the discovery error into contributions from both numerical discretization and function approximation.

1. Mathematical Framework

The central objective of LMM-DMD is the recovery of an unknown vector field $f : \mathbb{R}^d \to \mathbb{R}^d$ in the autonomous ODE

$\dot x(t) = f(x(t)), \qquad x(0) = x_0 \in \mathbb{R}^d,$

given a sequence of uniformly sampled state observations $x_n = x(t_n)$ , with $t_n = n h, \; n=0, \dots, N,$ and step size $h$ .

A general $s$ -step LMM for approximating time derivatives has the form

$\sum_{j=0}^s \alpha_j x_{n+j} = h \sum_{j=0}^s \beta_j f(x_{n+j}),$

where the coefficients $\{\alpha_j\}$ and $\{\beta_j\}$ encode the specific numerical scheme: e.g., Adams–Bashforth (AB), Adams–Moulton (AM), or Backward Differentiation Formula (BDF).

The inverse problem is formulated by treating the sampled $\{x_n\}$ as fixed and seeking to identify $f$ (or its parametric approximation $f_\theta$ ) so that the LMM residuals are minimized or vanish: $J_h(\theta) = \frac{1}{N-s+1} \sum_{n=0}^{N-s} \left\| \sum_{j=0}^s \alpha_j x_{n+j} - h \sum_{j=0}^{s} \beta_j f_\theta(x_{n+j}) \right\|^2.$ Parametric representations using neural architectures such as deep feedforward networks or B-spline-based Kolmogorov–Arnold networks (KANs) are adopted for $f_\theta$ . Auxiliary conditions, such as finite-difference initializations for $f(x_j)$ at early time steps, may be imposed as necessary to guarantee a unique minimizer when the resulting system is under- or over-determined (Hu et al., 25 Jan 2025, Du et al., 2021, Keller et al., 2019).

2. Neural Network and Spline-Based Approximators

The LMM-DMD framework accommodates various function classes for $f_\theta$ :

Feedforward Neural Networks: Deep fully connected networks (e.g., ReLU or tanh activations) parameterize $f_\theta$ . A typical architecture employs depth $L$ , uniform width $W$ , and nonlinearity $\sigma(z)$ :

$f_\theta(x) = W_{L+1} \, \sigma( W_L \, \sigma( \cdots \sigma(W_1 x + b_1) \cdots ) + b_L ) + b_{L+1}.$

The parameter count is $O(L W^2)$ (Du et al., 2021, Zhu et al., 2022).

Kolmogorov–Arnold Networks (KANs): For increased approximation capacity and explicit error bounds, two-layer KANs with B-spline activation functions are constructed. Each scalar component $f_i(x)$ is approximated as

$\phi(x) = \sum_{j=1}^N \phi_{1,j} \left( \sum_{i=1}^d \phi_{0,j,i}(x_i) \right),$

where $\phi_{l,\ast}(\xi) = \sum_{q=0}^{G+k-1} c^{(l)}_q B_q^{(k)}(\xi)$ employs B-splines of degree $k$ and $G$ interior knots.

The approximation error of KANs is quantified in the uniform norm: $\|f-\phi\|_{L^\infty([0,1]^d)} \leq N (L_G d + 1) \, \omega_f( \min\{ (2k-2)^{-1/2}, \sqrt{ k/12G } \} ),$ where $L_G$ denotes a spline-induced Lipschitz constant, and $\omega_f(r)$ is the modulus of continuity of $f$ (Hu et al., 25 Jan 2025).

3. Discretization, Consistency, and Stability in Discovery

The convergence theory for LMM-DMD diverges from that of forward integration. The defining LMM system for the inverse problem is

$\sum_{m=0}^{M} \alpha_m y_{n-m} - h \sum_{m=0}^{M} \beta_m f(y_{n-m}) = 0,$

with data $\{y_n\}$ . Consistency and stability criteria for convergence in the discovery context differ as follows (Keller et al., 2019):

Consistency: The local defect $\tau_h$ on the trajectory must satisfy $\|\tau_h\|_\infty = O(h^p)$ , where $p$ is the LMM order.
Stability: The root condition for the second characteristic polynomial $\sigma(\zeta) = \sum_{m=0}^M \beta_m \zeta^{M-m}$ governs the stability and conditioning of the recovery of $f$ . The strong root condition (all roots inside the unit disc) ensures bounded error growth with $N$ .
Convergence Theorem: If the second polynomial satisfies the strong root condition and the method is consistent, then

$\max_{M \leq n \leq N} \| \hat f(y_n) - f_\ast(y_n) \| = O(h^p).$

Analysis for specific schemes yields:

Scheme	Orders with Discovery Convergence	Root Condition
Adams–Bashforth	$1 \leq M \leq 6$	Strong root holds
Adams–Moulton	$M=0,1$ only	$M\geq 2$ unstable
BDF	$M \geq 1$	Always stable

(Keller et al., 2019, Du et al., 2021)

4. Error Bounds: Approximation and Discretization Splitting

The LMM-DMD error admits an explicit decomposition into discretization and learning (approximation) terms: $\| f_\theta(x_n) - f(x_n) \|_{\ell^2_h} \leq C \kappa_2(A_h) \left( h^p + \varepsilon_{\rm net} \right),$ where $\varepsilon_{\rm net}$ is the best-approximation error achievable by $f_\theta$ , $\kappa_2(A_h)$ is the condition number of the LMM matrix (uniformly bounded for zero-stable schemes), and $C$ is independent of $h$ and network size (Du et al., 2021, Hu et al., 25 Jan 2025).

For KANs, uniform-norm error can be further bounded via the Kolmogorov-superposition theorem and explicit B-spline regularity. Provided that $f$ is Hölder continuous with exponent $\alpha$ ,

$\| f - \phi \|_\infty \leq \lambda N (L_G d + 1) \left( \min\{ (2k-2)^{-1/2}, \sqrt{ k/12G }\} \right)^\alpha.$

The solution-trajectory error for the learned $f_\theta$ , by Grönwall's inequality, satisfies

$\max_n \| x(t_n) - \tilde x(t_n) \| \leq C( h^p + \varepsilon_{\rm net} ).$

Error analyses using inverse modified differential equations (IMDE) further show that the total approximation error on compact domains can be expressed as

$\int_K \| f_{\theta^*} - f \| \leq c_2 h^p + c_0 \sqrt{ \mathcal{L} },$

where $\mathcal{L}$ is the learning (training) loss (Zhu et al., 2022).

5. Representative Numerical Experiments

LMM-DMD has been systematically validated across a range of ODE benchmarks:

Linear and Nonlinear Oscillators: Recovery of simple, damped, and cubic oscillators using both explicit (AB) and implicit (BDF, AM) LMMs. Empirically, fitted-field errors decay at $O(h^p)$ with decreasing step size until limited by approximation errors, precisely as predicted by theory.
Lorenz '63 (Chaotic): On canonical chaotic trajectories, both Adams–Bashforth and BDF methods of orders $1$–$4$ yield convergence rates matching the multistep order for sufficiently small $h$ . For long prediction horizons, errors grow exponentially due to positive Lyapunov exponents.
Biochemical Networks (e.g., Glycolytic Oscillator, $d=7$ ): Using AM(1) with KAN ( $k=3$ , $G=64$ , $h=10^{-3}$ ), the method accurately recovers all species time-series in both training and extrapolation regions, retaining small sup-norm errors.

In all cases, the observed grid errors mirror theoretical predictions, exhibiting polynomial decay in $h$ and saturation at the neural approximation threshold. BDF and low-order AB schemes are especially robust in high-dimensional or stiff regimes (Hu et al., 25 Jan 2025, Du et al., 2021, Keller et al., 2019, Zhu et al., 2022).

6. Practical Considerations, Limitations, and Recommendations

Robust application of LMM-DMD mandates attention to several practical aspects:

Scheme Selection: Only LMMs whose second characteristic polynomial satisfies the strong root condition ensure stability for discovery. BDF methods (any order) and AB up to sixth order are safe. AM is limited to AM-0 (implicit Euler) and AM-1 (trapezoid rule).
Auxiliary Initial Conditions: Over- or under-determined linear systems may require finite-difference approximations for initialization, especially for explicit methods such as AB; BDF methods are less dependent on such conditions.
Grid Step Size and Network Capacity: To ensure error is dominated by discretization rather than network approximation, one should choose $h$ so that $h^p$ exceeds the learning floor, with networks sufficiently wide/deep as needed ( $\varepsilon_{\rm net} \ll h^p$ ).
Analyticity and High-Frequency Modes: Theoretical guarantees rely on sufficient regularity (analyticity) of both the true and learned vector fields. Practical implementations benefit from the implicit spectral bias of SGD, but explicit regularization may be required for stiff or highly oscillatory systems.
Limitations: High-dimensional systems and stiff or multiscale dynamics increase the required data and network size. Large $N$ or $G$ in KANs/Kolmogorov–Arnold approaches may be computationally expensive. Theoretical results assume noiseless, densely sampled trajectories, and extension to noisy or incomplete data remains an open area.

LMM-DMD provides a sharp analytic and algorithmic foundation for data-driven ODE discovery, integrating high-order numerical schemes and expressive neural approximators with rigorous a priori error control (Hu et al., 25 Jan 2025, Du et al., 2021, Keller et al., 2019, Zhu et al., 2022).

7. Research Directions and Applications

The LMM-DMD framework is broadly applicable across domains where the governing dynamical law is unknown, but state trajectories can be sampled:

System Identification: Physics, biology (e.g., cellular signaling networks, gene regulatory dynamics), engineering (control systems), and neuroscience (population dynamics).
Model Discovery with Noisy Data: While the canonical theory addresses noiseless data, extensions incorporating regularization or robust statistics are becoming increasingly relevant.
PDE Discovery and Multi-Scale Systems: Extensions that incorporate spatial discretization, run multi-stage or multirate schemes, or recover partial differential equations are active research areas.
Comparison and Integration with Other Methods: LMM-DMD complements approaches such as Runge–Kutta discovery or sparse regression (e.g., SINDy), providing high-order accuracy and error bounds conditioned on neural-network representational power.

LMM-DMD thus constitutes a rigorous, versatile, and high-accuracy paradigm for the inverse modeling of continuous-time dynamics from time-series data, with proven convergence theory and broad relevance across scientific domains.