Papers
Topics
Authors
Recent
Search
2000 character limit reached

Universal Sequence Preconditioning (USP)

Updated 3 July 2026
  • Universal Sequence Preconditioning is a framework that transforms sequential predictions and iterative linear solvers using polynomial-based methods to reduce memory requirements and improve convergence.
  • It employs Chebyshev and Legendre polynomial techniques in LDS settings to convert long-memory autoregressions into O(log T) models with sublinear cumulative error bounds.
  • For sparse linear systems, USP updates preconditioners via efficient sparse mappings, maintaining near-optimal conditioning and fast convergence in applications like topology optimization.

Universal Sequence Preconditioning (USP) is a framework for transforming sequential prediction and linear solve problems into a memory- and convergence-efficient regime by leveraging polynomial-based preconditioners. Distinct conceptualizations of USP exist in the settings of learning in linear dynamical systems (LDS) and sequences of parameterized linear systems; both departs from classical preconditioning by enabling provably efficient learning and solution procedures even for high-dimensional, ill-conditioned, or nonnormal problems (Grim-McNally et al., 2016, Marsden et al., 8 May 2026).

1. Formal Definition and Construction

USP in the context of LDS prediction consists of convolving the observed sequence with the coefficients of a monic Chebyshev or Legendre polynomial. For a single-input, single-output LDS,

ht=Aht1+But,yt=Cht,h_t = A h_{t-1} + B u_t,\quad y_t = C h_t,

and a history parameter n=3log(CBκT)n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil, where κ=cond(P),A=PΛP1\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}, USP constructs preconditioned features:

at=[yt1,...,ytn,ut,...utn+1]R2na_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}

using Chebyshev polynomial coefficients c1,...,cnc_1,...,c_n defined by pncheby(x)=xn+i=0n1cnixip^{\text{cheby}}_n(x) = x^n + \sum_{i=0}^{n-1}c_{n-i}x^i. The annihilation identity,

yt=i=1nciyti+s=0n1θsUSPuts+ϵt,y_t = -\sum_{i=1}^n c_i y_{t-i} + \sum_{s=0}^{n-1}\theta_s^{\text{USP}} u_{t-s} + \epsilon_t,

with

θsUSP=i=0sciCAsiB,\theta_s^{\text{USP}} = \sum_{i=0}^{s} c_i C A^{s-i}B,

recasts the original long-memory prediction into an O(logT)O(\log T)-memory problem, up to an exponentially small residual ϵtTCBκ2n|\epsilon_t| \leq T \|C\|\|B\| \kappa 2^{-n} (Marsden et al., 8 May 2026).

In the context of sequences of large, sparse linear systems, USP (also termed Sparse Approximate Map update in earlier literature) updates an initial preconditioner n=3log(CBκT)n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil0 for a reference matrix n=3log(CBκT)n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil1 via a sparse map n=3log(CBκT)n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil2, approximately solving n=3log(CBκT)n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil3. The updated preconditioner n=3log(CBκT)n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil4 yields good preconditioning properties for n=3log(CBκT)n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil5 while amortizing the cost of the expensive n=3log(CBκT)n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil6 construction (Grim-McNally et al., 2016).

2. Theoretical Properties and Regret Analysis

USP fundamentally reduces the effective memory length or complexity of both prediction and linear system solution. In LDS settings, this allows a transformation of an autoregression with n=3log(CBκT)n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil7 lags into an n=3log(CBκT)n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil8 lag representation, while incurring a tradeoff with exponentially growing comparator norm (diameter):

n=3log(CBκT)n = \lceil 3 \log (\|C\| \|B\| \kappa T) \rceil9

and preconditioned feature norms κ=cond(P),A=PΛP1\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}0.

Standard first-order online optimization regret bounds κ=cond(P),A=PΛP1\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}1 are insufficient in the USP regime due to κ=cond(P),A=PΛP1\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}2’s exponential scaling in κ=cond(P),A=PΛP1\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}3. However, sequence preconditioning admits application of the Vovk-Azoury-Warmuth (VAW) forecaster, which enjoys a logarithmic dependence on κ=cond(P),A=PΛP1\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}4:

κ=cond(P),A=PΛP1\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}5

for κ=cond(P),A=PΛP1\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}6, marking the first cumulative squared prediction error scaling polylogarithmically in κ=cond(P),A=PΛP1\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}7 even with asymmetric, marginally-stable κ=cond(P),A=PΛP1\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}8 (Marsden et al., 8 May 2026). The sublinear regret bounds are also dimension-independent up to logarithmic factors.

In the context of iterative linear solvers, USP guarantees that for the residual κ=cond(P),A=PΛP1\kappa = \text{cond}(P),\,A = P \Lambda P^{-1}9, if at=[yt1,...,ytn,ut,...utn+1]R2na_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}0 is sufficiently small, then the condition number and convergence of the preconditioned iterative method remain near-optimal:

at=[yt1,...,ytn,ut,...utn+1]R2na_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}1

3. Algorithmic Procedures and Cost

LDS Prediction

USP first fixes the degree at=[yt1,...,ytn,ut,...utn+1]R2na_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}2 based on a problem-dependent parameter, calculates Chebyshev polynomial coefficients, and forms preconditioned features at=[yt1,...,ytn,ut,...utn+1]R2na_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}3 at each time step. Prediction algorithms (e.g., VAW) operate on this reduced representation. The polynomial convolution (for Chebyshev or Legendre polynomials) is agnostic to the specific signal and can be performed off-line.

Linear System Sequence Preconditioning

For sequence preconditioning in linear system solves:

  • Precompute a strong preconditioner at=[yt1,...,ytn,ut,...utn+1]R2na_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}4 for a reference system at=[yt1,...,ytn,ut,...utn+1]R2na_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}5.
  • Choose a sparsity pattern at=[yt1,...,ytn,ut,...utn+1]R2na_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}6 and for each new matrix at=[yt1,...,ytn,ut,...utn+1]R2na_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}7, solve at=[yt1,...,ytn,ut,...utn+1]R2na_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}8 small independent least-squares problems over columns indexed by at=[yt1,...,ytn,ut,...utn+1]R2na_t = [y_{t-1}, ..., y_{t-n},\, u_t, ... u_{t-n+1}]^\top \in \mathbb{R}^{2n}9 to find c1,...,cnc_1,...,c_n0.
  • Update preconditioner c1,...,cnc_1,...,c_n1, with mat-vec application in preconditioned Krylov solvers requiring only one c1,...,cnc_1,...,c_n2 application and one c1,...,cnc_1,...,c_n3 sparse multiplication.

The cost of c1,...,cnc_1,...,c_n4 construction typically dominates, while each c1,...,cnc_1,...,c_n5 update is orders of magnitude cheaper, and c1,...,cnc_1,...,c_n6 application is efficient.

4. Complex Spectra and Generalization

A significant technical barrier with Chebyshev-based USP is the localization of the annihilation property to eigenvalues within the real interval c1,...,cnc_1,...,c_n7, or small argument in the complex plane. Recent advances provide a complex-analytic bound on the decay of monic Chebyshev polynomials, proving that for eigenvalues with c1,...,cnc_1,...,c_n8, c1,...,cnc_1,...,c_n9, one still obtains

pncheby(x)=xn+i=0n1cnixip^{\text{cheby}}_n(x) = x^n + \sum_{i=0}^{n-1}c_{n-i}x^i0

for large pncheby(x)=xn+i=0n1cnixip^{\text{cheby}}_n(x) = x^n + \sum_{i=0}^{n-1}c_{n-i}x^i1. This extension enables the application of USP to LDS with spectra in the unit disk whose complex argument is absolutely bounded, thereby accommodating asymmetric and nonnormal systems (Marsden et al., 8 May 2026).

Further, in both synthetic and real-world datasets, USP generalizes beyond strict LDS models to broader sequential data and learning architectures, including recurrent neural networks.

5. Empirical Performance and Application Domains

Empirical results highlight key efficiency gains:

  • Topology Optimization: USP (with periodic preconditioner updates) halves total GMRES time compared to naive preconditioner reuse for mesh sizes pncheby(x)=xn+i=0n1cnixip^{\text{cheby}}_n(x) = x^n + \sum_{i=0}^{n-1}c_{n-i}x^i2.
  • Interpolatory Model Reduction: USP reduces the number of GMRES iterations relative to always recomputing preconditioners or reusing suboptimal ones; cost savings are substantial when shifts or parameters change through the sequence.
  • Indefinite Helmholtz Problems: Preconditioner degradation is arrested by USP updates, keeping iteration counts stable even as spectral properties worsen (Grim-McNally et al., 2016).

USP's transformation enables both efficient online learning with provably optimal regret and accelerated iterative linear algebra when large sequences of similar systems must be solved. The method maintains low operator complexity and is especially advantageous when the alternative—fresh preconditioner construction—is costly.

6. Limitations and Practical Considerations

Although USP provides powerful guarantees and empirical advantages, the exponential blowup in coefficient norms (diameter pncheby(x)=xn+i=0n1cnixip^{\text{cheby}}_n(x) = x^n + \sum_{i=0}^{n-1}c_{n-i}x^i3) presents challenges for traditional online learning procedures. The VAW forecaster circumvents this by leveraging logarithmic dependence in its regret bound, but implementations must carefully manage feature scaling. Choice of history length pncheby(x)=xn+i=0n1cnixip^{\text{cheby}}_n(x) = x^n + \sum_{i=0}^{n-1}c_{n-i}x^i4, polynomial family, and regularization are critical for realizing the theoretical advantages in practice.

USP requires accurate computation of reference preconditioners and mappings, and relies on stable, high-quality least-squares solvers for map construction. For system sequences with abrupt or highly nonlocal changes, USP's amortized efficiency may degrade, and hybrid strategies—recombining full preconditioner construction with USP updates—may be required.


References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Universal Sequence Preconditioning (USP).