Linear Decay-to-Zero (D2Z) Systems

Updated 30 October 2025

Linear Decay-to-Zero (D2Z) is a class of methods that ensure system variables or forecasts decay to zero at a rate linearly bounded in time or iterations.
It spans applications in control, forecasting, deep learning, PDEs, and kinetic equations, optimizing error convergence and resource allocation.
D2Z implementations use techniques like triangular stability and linear learning rate schedules to deliver provable optimal decay, outperforming exponential and hyperbolic counterparts.

Linear Decay-to-Zero (D2Z) designates a class of solution behaviors, algorithmic schedules, and control mechanisms in which a system variable, state, or forecast is provably driven to zero at a rate that is, or is strictly bounded by, a linear function in time or iteration. D2Z arises in nonlinear control systems, time series forecasting, learning rate schedules in deep learning, dissipative PDEs and kinetic equations, and sequence modeling architectures, with distinguishing mathematical, algorithmic, and practical properties. D2Z mechanisms both optimize error convergence and control resource usage, and their optimality and dominance over exponential/hyperbolic decay is substantiated in multiple application domains. This article surveys D2Z's theoretical foundations, applicable equations, optimality conditions, algorithmic constructions, and cross-domain relevance.

1. Mathematical Characterization and Occurrence of D2Z

Linear decay-to-zero quantifies strict polynomial convergence, most commonly as $x(t)\leq C(1-t/\tau)^+$ or $x(t)\leq C/(1+t)^k$ for suitable $k>0$ , with $C>0$ and prescribed time horizon $\tau>0$ . D2Z arises in distinct mathematical settings:

Stability of Dynamical Systems: "Triangular stability" [Editor’s term] as proposed in (Shakouri et al., 2021) defines global triangular decay, $\|x(t)\|\leq \sigma\|x_0\|\,\Lambda(t/\tau)$ , with $\Lambda(t/\tau)=\max\{1-t/\tau,0\}$ , ensuring exact zero at $t=\tau$ .
Semi-linear Dissipative PDEs: For abstract hyperbolic equations $u''(t)+u'(t)+Au(t)+f(u(t))=0$ with operator $A$ possessing a nontrivial kernel, all solutions obey a polynomial D2Z rate, $|u(t)|^2\leq M_2/(1+t)^{1/p}$ , while a nonempty open set of initial data achieves this rate exactly (Ghisi et al., 2013).
Learning Rate Schedules: In neural network training, linear LR schedules (D2Z), $\eta_t=\eta_0(1-t/T)$ for step $t$ in $[0,T]$ , outperform step or cosine decay, optimizing the balance between bias removal and variance minimization in AdamW (Bergsma et al., 21 Feb 2025).
Forecasting in Time Series: Croston-style intermittent demand methods can be constructed so forecasts linearly decay to zero post-obsolescence, $f_t= f_{init}\max\left\{1-\beta\tau_t/2\hat{\tau}_t,0\right\}$ , strikingly eliminating persistent bias in inventory (Prestwich et al., 2014).
Kinetic Equations: Linear kinetic equations in half-space under absorbing boundary conditions exhibit decay $t^{-1-d/2}$ in weighted norms, matching heat equation Dirichlet decay (Bouin et al., 14 Jul 2025).

2. Optimality, Kernel Structure, and Lower Bounds

Optimality of D2Z rates is made precise in operator-theoretic and control contexts:

In semi-linear dissipative hyperbolic equations, the presence of a nontrivial kernel of $A$ dictates polynomial, not exponential, decay for generic initial data with nonzero projection on $\ker(A)$ ; the rate $t^{-1/p}$ is not only an upper bound but achieved as a lower bound on a nonempty open set—i.e. slow solutions exist and are characterized explicitly (Ghisi et al., 2013).
In prescribed-time control, triangular stability strictly implies exact decay to zero at time $\tau$ for all initial states, which is impossible for conventional finite/fixed-time stable designs; only D2Z controllers guarantee user-commanded timing without chattering (Shakouri et al., 2021).
The superiority of D2Z over exponential/hyperbolic decay is proven in forecasting, where cumulative error, mean squared error, and percent-best metrics all favor linear over other decay regimes under obsolescence (Prestwich et al., 2014).

System/Equation	Decay Rate	Optimality/Open Set
Dissipative PDE ( $\ker A\neq 0$ )	$t^{-1/p}$	Optimal, open set of data
Control (triangular stability)	$1-t/\tau$	Exact, all states
Intermittent demand (LES)	Linear/D2Z	Asymptotic best, 100% PBt

3. Algorithmic Realization and Design Principles

Implementation of D2Z techniques varies by discipline:

Control: Prescribed-time controllers employ explicit companion-matrix-based formulas and time-varying gain normalization ( $u=\pi(x,t,\tau)/(\gamma_{min}g(t))$ ). Parameter selection via Lyapunov equation ensures Hurwitz stability and decay (Shakouri et al., 2021).
Forecasting: LES updates its forecast by multiplying the last positive estimate with a linearly decreasing factor. The rate is tunable via smoothing parameter $\beta$ ; after a finite number of zero-demand periods, forecast is identically zero (Prestwich et al., 2014).
Deep Learning: Learning rate is updated linearly per step after warmup; $LR_t=LR_{max}(1-t/T)$ . AdamW’s update can be rewritten as a convex combination with bias decay and variance averaging governed directly by the LR schedule shape. D2Z balances update weighting over the full run, optimally reducing both early bias and late noise (Bergsma et al., 21 Feb 2025).
Sequence Models: In linear attention, decay-to-zero is imposed via a parameterization such as $\lambda_t^j=\mathrm{sigmoid}(\mathbf{f}_t^j+\Delta_t^j)$ , with $\Delta_t^j$ set to yield median decay in $[0.8,0.99]$ ; vector decay outperforms scalar decay when properly calibrated (Qin et al., 5 Sep 2025).

Domain	D2Z Mechanism	Parameterization/Implementation
Control	Triangular function	Companion matrix, Lyapunov eqn
Forecasting	Linear factor	Smoothing parameter $\beta$
Deep Learning	Linear LR schedule	$\eta_t=\eta_0(1-t/T)$
Attention	Decay parameter $\lambda_t$	Sigmoid transform, vector preferred

4. Applications, Theoretical Extensions, and Regime Comparisons

D2Z applies across disciplines:

Partial Differential Equations: Results on decay in Neumann/Dirichlet-damped wave equations and nonlocal cases establish generic polynomial convergence of energy and solution norm under mild nonlinearity—contrasting with exponential decay for trivial kernels (Ghisi et al., 2013).
Inventory and Demand Forecasting: LES method ignores persistent demand after obsolescence, quickly zeroing forecasts, preferable under Percent Best error and cumulative error (Prestwich et al., 2014).
Training LLMs: Linear D2Z schedules at scale save up to 60% compute while producing models of equal or superior loss at high token-per-parameter ratios, as compared to step/cosine decay schedules (Bergsma et al., 21 Feb 2025).
Kinetic Theory and Statistical Physics: Action decay in plasma (Landau damping) and Fisher information decay in Boltzmann equations admit precise D2Z or exponential convergence, with rates determined by collision, boundary, and potential properties (Bénisti, 2015, Monmarché, 2017, Bouin et al., 14 Jul 2025).

Regime	Standard Decay	D2Z Decay	Advantage
PDE (trivial ker)	Exponential	--	Fast if possible
PDE (nontrivial)	--	Polynomial (D2Z)	Optimal if kernel present
LLM Training	Cosine/Step	Linear (D2Z)	Robust, compute savings
Demand Forecast	Exponential/Hyper	Linear (D2Z)	No persistent error

5. Mechanistic Insights and Trade-offs

For neural training, the AdamW EMA interpretation illustrates the role of D2Z in “recentering” recent updates and controlling variance: linear LR schedules ensure each update in late training contributes approximately equally for minimal bias/variance sum (Bergsma et al., 21 Feb 2025). In sequence models, D2Z decay mechanisms support long-context locality priors without requiring explicit positional encodings; median decay range $\sim$ 0.8–0.99 is critical for best performance (Qin et al., 5 Sep 2025). In control, enforcing strict linear decay avoids singular control values until $t=\tau$ ; practical execution requires ceasing actuation just prior to terminal singularity (Shakouri et al., 2021). For kinetic and wave equations, D2Z rates depend on boundary conditions, kernel structure, and operator properties, with improved decay under absorption (e.g., Dirichlet, half-space kinetic) (Bouin et al., 14 Jul 2025).

6. Limitations, Contingencies, and Domain-Specific Considerations

D2Z regimes require precise calibration and mild regularity assumptions. In PDEs, exponential decay becomes achievable only when the kernel is trivial; for nontrivial kernels, polynomial decay is fastest possible (Ghisi et al., 2013). In LLM training, D2Z is not optimal for short datasets (low TPP), and too aggressive LR schedules may induce instability if not paired with robust initialization and optimizer settings (Bergsma et al., 21 Feb 2025). In demand forecasting, rapid linear decay may (rarely) introduce slight bias if smoothing parameters are excessive; however, such events are empirically negligible (Prestwich et al., 2014). For kinetic equations, optimal rates depend on the Nash-type inequality structure and absorbing boundary formulation (Bouin et al., 14 Jul 2025). In sequence modeling, improper parameter sharing can miscalibrate decay values, leading to degraded performance; vector decay is preferred but must be carefully initialized (Qin et al., 5 Sep 2025).

7. Summary Table: Key D2Z Frameworks Across Domains

Domain / System	D2Z Expression	Optimality & Robustness
Semi-linear hyperbolic PDEs	$\|u(t)\| \sim t^{-1/p}$	Proven optimal, generic for $\ker A$
Prescribed-time control	$x(t)\leq\sigma x_0(1-t/\tau)$	Exact for given $\tau$ , robust
LLM learning rate	$\eta_t = \eta_0(1-t/T)$	Best at scale, stable across params
Intermittent demand forecast	$f_t=(\hat{y}_t/\hat{\tau}_t)(1-\beta\tau_t/2\hat{\tau}_t)^+$	Asymptotically best for obsolescence
Linear kinetic equations	$\\|f(t)\\|\lesssim t^{-1-d/2}$	Optimal for absorbing boundary

Linear Decay-to-Zero (D2Z) demarcates a mathematically and algorithmically rigorous class of decay behaviors, schedules, and control strategies that guarantee or robustly approximate linear convergence to zero under operator constraints, prescribed timing, or design in forecasting and training. Across diverse applications, D2Z delivers provable optimality, practical robustness, and superior asymptotic error characteristics under well-understood conditions, establishing it as a foundational principle in modern mathematical modeling and computational engineering.