The Power of Second Order Methods for Sequence Preconditioning

Published 8 May 2026 in cs.LG | (2605.08390v1)

Abstract: Sequence prediction methods for dynamical systems with long memory, i.e. marginally stable systems, typically achieve regret that grows polynomially with the hidden dimension of the underlying generative model. Universal Sequence Preconditioning (USP) is a method that compresses any sequence which comes from a linear dynamical system into a "preconditioned" sequence which requires exponentially shorter memory for accurate prediction. However, the preconditioned sequence yields exponentially larger diameters and gradients, hindering USP from unlocking optimal regret bounds. Inspired by the minimum description length principle, we show that the Vovk-Azoury-Warmuth (VAW) algorithm is naturally matched to the USP regime. Indeed, it takes advantage of the memory compression while remaining robust to the exponential explosion of the diameter. We prove that combining USP with VAW achieves astoundingly strong results: for any marginally-stable linear dynamical system, this algorithm achieves polylogarithmic regret $O \left( \log³ T \right)$ even in the presence of asymmetric hidden transition matrices. Finally, we extend the applicability of USP beyond bounded-spectrum systems by providing new complex-analytic bounds on Chebyshev polynomials, allowing for systems with constant complex arguments.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper demonstrates that pairing Chebyshev-based universal sequence preconditioning with the VAW forecaster reduces prediction regret to polylogarithmic dependence on both time horizon and system dimension.
It introduces a robust theoretical framework that extends Chebyshev polynomial approximation to complex eigenvalue domains, enabling effective prediction in asymmetric dynamical systems.
Empirical evaluations confirm that Chebyshev-preconditioned VAW outperforms first-order methods, especially in high-dimensional and long-memory linear dynamical contexts.

The Power of Second Order Methods for Sequence Preconditioning

Motivation and Problem Setting

The paper addresses the challenge of sequence prediction for marginally stable linear dynamical systems (LDS) with long-term dependencies and high hidden state dimensionality. Classical direct prediction and recovery-based approaches for LDS rely on learning autoregressive predictors whose regret scales polynomially with both the time horizon $T$ and the system dimension $d$ . Universal Sequence Preconditioning (USP) compresses the signal into a form suitable for short-memory prediction, but the Chebyshev polynomial-based preconditioning induces an exponential blowup in parameter norms. This norm inflation is fundamentally at odds with first-order online learning methods, which yield regret bounds that scale linearly in these norms. The paper investigates whether second-order online learning algorithms, specifically the Vovk-Azoury-Warmuth (VAW) forecaster, can unlock optimal regret bounds by being robust to such exponential parameter growth.

Theoretical Framework: Preconditioning and Second-Order Methods

The USP approach leverages the approximation theory of Chebyshev polynomials to compress any LDS signal with hidden dimension $d$ into a predictor that requires history and parameter dimension only $\mathcal{O}(\log T)$ . Mathematically, preconditioning the observed sequence by convolution with Chebyshev coefficients yields a sparse autoregressive representation whose approximation error decays exponentially in the degree. However, the coefficients of high-degree Chebyshev polynomials grow exponentially, resulting in regret bounds for first-order algorithms such as OGD that remain polynomial in $T$ and $d$ .

The authors provide a conjecture and partial justification that such coefficient growth is unavoidable for any monic polynomial achieving exponential flatness over $[-1,1]$ : a formal lower bound is given on the largest coefficient, which must itself grow exponentially with the degree. This "memory-norm" trade-off is thus provably fundamental in the design of preconditioners.

The central analytical innovation of the paper is to apply the VAW forecaster, which achieves regret scaling only logarithmically with the norm of the comparator and linearly in the effective dimension. This property is vital: pairing VAW with USP enables a reduction of prediction regret to $\mathcal{O}(\log^3 T)$ —a polylogarithmic dependence on $T$ and $d$ , resolved for arbitrary (including asymmetric and complex-eigenvalued) LDSs. The theoretical results are not contingent on eigenvalues being real or having vanishing complex angle, as the authors extend the Chebyshev analytic bounds to handle constant complex arguments.

Main Algorithm and Regret Guarantees

The main prediction algorithm applies VAW to features constructed from the last $d$ 0 observations and inputs, where $d$ 1 is the degree of the Chebyshev preconditioner ensuring the approximation error is exponentially small. The target parameter vector concatenates the negative Chebyshev coefficients and the “preconditioned" autoregressive parameters; this predictor is shown to approximate the LDS output up to exponentially small residual error. All analysis is done relative to the noiseless case, further corroborated by robustness to small observation noise in experiments.

The principal theoretical result (Theorem~\ref{thm:main}) shows that the cumulative squared prediction error for sequences generated by any marginally stable LDS with possibly asymmetric transition matrices—so long as their eigenvalues have argument $d$ 2—is bounded by

$d$ 3

where $d$ 4 controls signal magnitude and $d$ 5 the matrix conditioning.

This is an exponential improvement over the prior state-of-the-art, which achieved only $d$ 6 regret and required much more restrictive eigenvalue assumptions.

Complex-Analytic Extensions

An essential technical novelty is the extension of Chebyshev approximation bounds to support LDSs whose transition matrices have eigenvalues with a constant complex argument. Prior analyses for USP required the imaginary part of eigenvalues to vanish with growing $d$ 7, severely limiting applicability to truly non-normal or asymmetric dynamical systems. The paper establishes new bounds for monic Chebyshev polynomials evaluated on such complex domains, ensuring that the approximation error remains exponentially decaying in the polynomial degree. This extension drastically broadens the class of systems for which polylogarithmic-regret online prediction is possible.

Empirical Evaluation

The experimental suite validates the theoretical hypotheses on synthetic LDS data. Several competitive baselines are considered: OGD, Adam, USP with learned polynomial coefficients, and Chebyshev-preconditioned VAW.

A key finding is that Chebyshev-preconditioned VAW achieves the lowest steady-state prediction error as the preconditioner degree increases, and this gap is accentuated as hidden dimension $d$ 8 grows (Figure 1).

Figure 1: Normalized prediction error versus preconditioner degree for hidden dimension $d$ 9, illustrating the superiority of Chebyshev-preconditioned VAW relative to first-order methods.

While OGD and Adam cannot exploit preconditioning beyond moderate degrees (performance degrades due to exploding coefficient norms), the VAW forecaster continues to benefit up to the theoretically indicated optimal degree, in alignment with the main regret bound and parameter-norm arguments. The empirical analysis further explores the effect of input collinearity and demonstrates that preconditioning remains effective under adverse excitation scenarios.

Practical and Theoretical Implications

The theoretical contributions resolve a long-standing open question: can online sequence predictors for marginally stable LDSs with complex, possibly asymmetric dynamics achieve polylogarithmic regret rates agnostic to hidden dimension? By showing that second-order methods absorb the exponential blowup in parameter norm induced by Chebyshev-based universal preconditioning, the paper achieves a qualitative leap in predictivity for this regime.

On the practical side, the results suggest that high-dimensional or weakly-excited dynamical systems—previously intractable for model-free online prediction—can be efficiently handled by combining signal-agnostic preconditioning with appropriate second-order optimization infrastructure.

The main limitation is the memory-demand of VAW, which grows (albeit slowly) with the time horizon due to the matrix inversion. There is also a gap between the required (online) learning of autoregressive coefficients and the ideal of a purely offline, universal sequence transformation. Empirical findings indicate that the norm of the preconditioned signal is often much lower than that of the raw signal, suggesting room for further tightening of the theory.

Future Directions

The extension to nonlinear dynamical systems—possibly exploiting universal preconditioning approaches for nonlinear sequences—is explicitly mentioned as a promising avenue, with references to ongoing breakthroughs. Another challenge is to extend the memory-efficiency of the procedure, perhaps via low-rank or streaming variants of VAW, to support real-time use in hardware-constrained environments.

Conclusion

This paper establishes that second-order online learning methods, specifically the VAW forecaster, paired with Chebyshev-based universal sequence preconditioning, achieve optimal polylogarithmic regret bounds for sequence prediction in marginally stable LDSs with arbitrary, possibly asymmetric dynamics. The main technical contributions—balancing effective memory with parameter-norm robustness, and extending Chebyshev bounds to constant complex arguments—eliminate the otherwise inevitable polynomial dependence on $d$ 0 and $d$ 1. Extensive empirical validation supports the theory and highlights several open questions around practical implementation and norm propagation in preconditioned signals. The work sets a new performance baseline for robust, dimension-agnostic online prediction in the presence of long memory.