Universal Sequence Preconditioning (USP)

Updated 3 July 2026

Universal Sequence Preconditioning is a framework that transforms sequential predictions and iterative linear solvers using polynomial-based methods to reduce memory requirements and improve convergence.
It employs Chebyshev and Legendre polynomial techniques in LDS settings to convert long-memory autoregressions into O(log T) models with sublinear cumulative error bounds.
For sparse linear systems, USP updates preconditioners via efficient sparse mappings, maintaining near-optimal conditioning and fast convergence in applications like topology optimization.

Universal Sequence Preconditioning (USP) is a framework for transforming sequential prediction and linear solve problems into a memory- and convergence-efficient regime by leveraging polynomial-based preconditioners. Distinct conceptualizations of USP exist in the settings of learning in linear dynamical systems (LDS) and sequences of parameterized linear systems; both departs from classical preconditioning by enabling provably efficient learning and solution procedures even for high-dimensional, ill-conditioned, or nonnormal problems (Grim-McNally et al., 2016, Marsden et al., 8 May 2026).

1. Formal Definition and Construction

USP in the context of LDS prediction consists of convolving the observed sequence with the coefficients of a monic Chebyshev or Legendre polynomial. For a single-input, single-output LDS,

$h_t = A h_{t-1} + B u_t,\quad y_t = C h_t,$

and a history parameter $n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil$ , where $\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}$ , USP constructs preconditioned features:

$a_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}$

using Chebyshev polynomial coefficients $c_1,...,c_n$ defined by $p^{\text{cheby}}_n(x) = x^n + \sum_{i=0}^{n-1}c_{n-i}x^i$ . The annihilation identity,

$y_t = -\sum_{i=1}^n c_i y_{t-i} + \sum_{s=0}^{n-1}\theta_s^{\text{USP}} u_{t-s} + \epsilon_t,$

with

$\theta_s^{\text{USP}} = \sum_{i=0}^{s} c_i C A^{s-i}B,$

recasts the original long-memory prediction into an $O(\log T)$ -memory problem, up to an exponentially small residual $|\epsilon_t| \leq T \|C\|\|B\| \kappa 2^{-n}$ (Marsden et al., 8 May 2026).

In the context of sequences of large, sparse linear systems, USP (also termed Sparse Approximate Map update in earlier literature) updates an initial preconditioner $n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil$ 0 for a reference matrix $n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil$ 1 via a sparse map $n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil$ 2, approximately solving $n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil$ 3. The updated preconditioner $n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil$ 4 yields good preconditioning properties for $n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil$ 5 while amortizing the cost of the expensive $n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil$ 6 construction (Grim-McNally et al., 2016).

2. Theoretical Properties and Regret Analysis

USP fundamentally reduces the effective memory length or complexity of both prediction and linear system solution. In LDS settings, this allows a transformation of an autoregression with $n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil$ 7 lags into an $n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil$ 8 lag representation, while incurring a tradeoff with exponentially growing comparator norm (diameter):

$n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil$ 9

and preconditioned feature norms $\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}$ 0.

Standard first-order online optimization regret bounds $\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}$ 1 are insufficient in the USP regime due to $\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}$ 2’s exponential scaling in $\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}$ 3. However, sequence preconditioning admits application of the Vovk-Azoury-Warmuth (VAW) forecaster, which enjoys a logarithmic dependence on $\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}$ 4:

$\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}$ 5

for $\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}$ 6, marking the first cumulative squared prediction error scaling polylogarithmically in $\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}$ 7 even with asymmetric, marginally-stable $\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}$ 8 (Marsden et al., 8 May 2026). The sublinear regret bounds are also dimension-independent up to logarithmic factors.

In the context of iterative linear solvers, USP guarantees that for the residual $\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}$ 9, if $a_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}$ 0 is sufficiently small, then the condition number and convergence of the preconditioned iterative method remain near-optimal:

$a_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}$ 1

3. Algorithmic Procedures and Cost

LDS Prediction

USP first fixes the degree $a_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}$ 2 based on a problem-dependent parameter, calculates Chebyshev polynomial coefficients, and forms preconditioned features $a_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}$ 3 at each time step. Prediction algorithms (e.g., VAW) operate on this reduced representation. The polynomial convolution (for Chebyshev or Legendre polynomials) is agnostic to the specific signal and can be performed off-line.

Linear System Sequence Preconditioning

For sequence preconditioning in linear system solves:

Precompute a strong preconditioner $a_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}$ 4 for a reference system $a_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}$ 5.
Choose a sparsity pattern $a_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}$ 6 and for each new matrix $a_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}$ 7, solve $a_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}$ 8 small independent least-squares problems over columns indexed by $a_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}$ 9 to find $c_1,...,c_n$ 0.
Update preconditioner $c_1,...,c_n$ 1, with mat-vec application in preconditioned Krylov solvers requiring only one $c_1,...,c_n$ 2 application and one $c_1,...,c_n$ 3 sparse multiplication.

The cost of $c_1,...,c_n$ 4 construction typically dominates, while each $c_1,...,c_n$ 5 update is orders of magnitude cheaper, and $c_1,...,c_n$ 6 application is efficient.

4. Complex Spectra and Generalization

A significant technical barrier with Chebyshev-based USP is the localization of the annihilation property to eigenvalues within the real interval $c_1,...,c_n$ 7, or small argument in the complex plane. Recent advances provide a complex-analytic bound on the decay of monic Chebyshev polynomials, proving that for eigenvalues with $c_1,...,c_n$ 8, $c_1,...,c_n$ 9, one still obtains

$p^{\text{cheby}}_n(x) = x^n + \sum_{i=0}^{n-1}c_{n-i}x^i$ 0

for large $p^{\text{cheby}}_n(x) = x^n + \sum_{i=0}^{n-1}c_{n-i}x^i$ 1. This extension enables the application of USP to LDS with spectra in the unit disk whose complex argument is absolutely bounded, thereby accommodating asymmetric and nonnormal systems (Marsden et al., 8 May 2026).

Further, in both synthetic and real-world datasets, USP generalizes beyond strict LDS models to broader sequential data and learning architectures, including recurrent neural networks.

5. Empirical Performance and Application Domains

Empirical results highlight key efficiency gains:

Topology Optimization: USP (with periodic preconditioner updates) halves total GMRES time compared to naive preconditioner reuse for mesh sizes $p^{\text{cheby}}_n(x) = x^n + \sum_{i=0}^{n-1}c_{n-i}x^i$ 2.
Interpolatory Model Reduction: USP reduces the number of GMRES iterations relative to always recomputing preconditioners or reusing suboptimal ones; cost savings are substantial when shifts or parameters change through the sequence.
Indefinite Helmholtz Problems: Preconditioner degradation is arrested by USP updates, keeping iteration counts stable even as spectral properties worsen (Grim-McNally et al., 2016).

USP's transformation enables both efficient online learning with provably optimal regret and accelerated iterative linear algebra when large sequences of similar systems must be solved. The method maintains low operator complexity and is especially advantageous when the alternative—fresh preconditioner construction—is costly.

6. Limitations and Practical Considerations

Although USP provides powerful guarantees and empirical advantages, the exponential blowup in coefficient norms (diameter $p^{\text{cheby}}_n(x) = x^n + \sum_{i=0}^{n-1}c_{n-i}x^i$ 3) presents challenges for traditional online learning procedures. The VAW forecaster circumvents this by leveraging logarithmic dependence in its regret bound, but implementations must carefully manage feature scaling. Choice of history length $p^{\text{cheby}}_n(x) = x^n + \sum_{i=0}^{n-1}c_{n-i}x^i$ 4, polynomial family, and regularization are critical for realizing the theoretical advantages in practice.

USP requires accurate computation of reference preconditioners and mappings, and relies on stable, high-quality least-squares solvers for map construction. For system sequences with abrupt or highly nonlocal changes, USP's amortized efficiency may degrade, and hybrid strategies—recombining full preconditioner construction with USP updates—may be required.

References:

(Grim-McNally et al., 2016) "Preconditioning Parametrized Linear Systems"
(Marsden et al., 8 May 2026) "The Power of Second Order Methods for Sequence Preconditioning"

Markdown Report Issue Upgrade to Chat

References (2)

Preconditioning Parametrized Linear Systems (2016)

The Power of Second Order Methods for Sequence Preconditioning (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Universal Sequence Preconditioning (USP).

Universal Sequence Preconditioning (USP)

1. Formal Definition and Construction

2. Theoretical Properties and Regret Analysis

3. Algorithmic Procedures and Cost

LDS Prediction

Linear System Sequence Preconditioning

4. Complex Spectra and Generalization

5. Empirical Performance and Application Domains

6. Limitations and Practical Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Universal Sequence Preconditioning (USP)

1. Formal Definition and Construction

2. Theoretical Properties and Regret Analysis

3. Algorithmic Procedures and Cost

LDS Prediction

Linear System Sequence Preconditioning

4. Complex Spectra and Generalization

5. Empirical Performance and Application Domains

6. Limitations and Practical Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research