HiPPO-LegS Matrix: Online Projection Operator

Updated 20 December 2025

HiPPO-LegS matrix is an online L²-projection operator that compresses continuous signal history onto a rescaled Legendre polynomial basis for robust sequential modeling.
Its formulation leverages a closed-form, lower-triangular structure and Legendre recurrence relations to update memory states without requiring explicit timescale priors.
The operator demonstrates strong stability, controlled approximation error, and improved performance over related methods like LMU in tasks with long-range dependencies.

A HiPPO-LegS matrix arises within the HiPPO (High-order Polynomial Projection Operators) framework as a closed-form, lower-triangular state-space operator for maintaining the online $L^2$ -projection of past continuous input signals onto a degree- $N$ Legendre polynomial basis supported on $[0, t]$ under a uniform measure. The HiPPO-LegS memory update mechanism provides a theoretically robust, efficient, and timescale-agnostic solution for compressing cumulative signal history in sequential or trajectory data. Its formulation delivers essential advancements in recurrent state space models (SSMs), especially for modeling long-range temporal dependencies in deep learning.

1. The HiPPO Framework and the LegS Specialization

The HiPPO framework seeks the optimal online projection of the truncated input history $f_{\leq t}$ onto an $N$ -dimensional polynomial space with respect to a time-varying measure $\mu^{(t)}$ . At each instant $t$ , given an orthonormal polynomial basis $\{p_n^{(t)}\}_{n=0}^{N-1}$ on $[0, t]$ ,

$g^{(t)}(x) = \sum_{n=0}^{N-1} c_n(t) p_n^{(t)}(x), \qquad c_n(t) = \langle f_{\leq t}, p_n^{(t)} \rangle_{\mu^{(t)}}$

For HiPPO-LegS, the measure is uniform on $[0, t]$ : $\mu^{(t)}(x) = \frac{1}{t} \mathbf{1}_{[0, t]}(x)$ , yielding rescaled Legendre polynomials. The projection coefficients $\mathbf{m}(t) = [c_0(t),...,c_{N-1}(t)]^\top$ evolve according to a linear, time-varying ODE, which underpins the HiPPO-LegS matrix structure (Gu et al., 2020).

2. Formulation of the HiPPO-LegS Matrix

The continuous-time HiPPO-LegS state equation is

$\frac{d}{dt} \, \mathbf{m}(t) = -\frac{1}{t} A_{\mathrm{LegS}} \, \mathbf{m}(t) + \frac{1}{t} B_{\mathrm{LegS}} f(t)$

where $f(t)$ is the scalar input, $A_{\mathrm{LegS}} \in \mathbb{R}^{N \times N}$ , and $B_{\mathrm{LegS}}\in\mathbb{R}^N$ are time-invariant matrices given by:

$(A_{\mathrm{LegS}})_{n,k} = \begin{cases} \sqrt{(2n+1)(2k+1)}, & n > k\ n+1, & n = k\ 0, & n < k \end{cases} \qquad (B_{\mathrm{LegS}})_n = \sqrt{2n+1}$

This closed-form, lower-triangular structure emerges from differentiating the orthogonal projection and exploiting Legendre polynomial recurrence relations (Gu et al., 2020, Gu et al., 2022, Park et al., 2024). The update dynamics optimally compress the entire signal history onto the polynomial basis at all timescales, avoiding explicit timescale priors.

3. Dynamics, Singularities, and Well-Posedness

Letting $P_n(x)$ denote standard Legendre polynomials on $[-1, 1]$ , the normalized basis on $[0, t]$ is

$\phi_n(s; t) = \sqrt{\frac{2n+1}{t}} \, P_n\left(2\frac{s}{t} - 1\right)$

The projected state $x_n(t)$ satisfies

$\dot{x}(t) = A(t)x(t) + b(t)u(t), \qquad A(t) = \frac{1}{t}\widetilde{A}, \quad b(t) = \frac{1}{\sqrt{t}}\widetilde{b}$

where $\widetilde{A}$ and $\widetilde{b}$ are derived from Legendre recurrence integrals. At $t \to 0$ , $A(t)$ develops a singularity, precluding arbitrary initial conditions. The only permissible initial state is $x_n(0) = 0$ for all $n$ , corresponding to zero prior history. Under this requirement, the ODE is well-posed on $(0, T]$ for Riemann-integrable inputs (Park et al., 2024).

4. Discretization and Numerical Analysis

Time discretization converts the continuous ODE to a discrete recurrence suitable for algorithmic implementation. For unit steps ( $\Delta t = 1$ ), forward Euler yields:

$m_{k+1} = (I - \frac{1}{k}A_{\mathrm{LegS}}) m_k + \frac{1}{k}B_{\mathrm{LegS}} x_{k+1}$

Alternative schemes such as backward Euler and midpoint (second-order Runge-Kutta) have also been shown to converge, with respective global accuracy $O(h)$ and $O(h^2)$ for step size $h$ and suitable resolution near $t=0$ . Stability is enhanced for implicit schemes due to the stiffness introduced by the $1/t$ scaling (Park et al., 2024). Each matrix-vector update can be performed in $O(N)$ time by leveraging the sparsity and structure of $A_{\mathrm{LegS}}$ (Gu et al., 2020).

5. Theoretical Properties and Memory Capacity

HiPPO-LegS exhibits several key properties critical for sequential modeling:

Timescale robustness: The evolution is equivariant to time dilations: if $f(t)\to f(\alpha t)$ , then $m_h(t) = m_f(\alpha t)$ , ensuring invariance to sampling rate changes and obviating timescale hyperparameter tuning (Gu et al., 2020).
Stability: The gradient decay in the mapping from inputs to memory coefficients is $\Theta(1/t)$ , avoiding vanishing or exploding gradients typical in recurrent networks.
Approximation error: For Lipschitz or smoother $f$ , the error in the $L^2$ projection decays as $O(t/\sqrt{N})$ or $O(t^k/N^{k-1/2})$ for $k$ -smooth $f$ , reflecting classical Legendre approximation bounds (Gu et al., 2020).
Spectral structure: The underlying $\widetilde{A}$ is nearly skew-Hermitian under the Legendre norm, leading to eigenvalues on the imaginary axis and precluding exponential growth of the homogeneous flow $e^{\int A \, dt}$ . The condition number of the eigenvector matrix scales polynomially with $N$ , ensuring practical stability for moderate $N$ (Park et al., 2024).

The Legendre Memory Unit (LMU, also referenced as LegT) projects onto a fixed-length window $[t-\theta, t]$ using a translated Legendre polynomial basis, yielding an ODE

$A_{\mathrm{LegT}}^{(n, k)} = \frac{1}{\theta} \begin{cases} (-1)^{n-k}(2n+1), & n \geq k\ 2n+1, & n < k \end{cases} ,\quad B_{\mathrm{LegT}}^{(n)} = \frac{1}{\theta}(2n+1)(-1)^n$

This approach introduces a timescale hyperparameter $\theta$ , requires nontrivial boundary treatment, and is less robust under varying sampling rates. HiPPO-LegS eliminates the need for such priors by adapting the window to $[0, t]$ at all timescales and empirically outperforms LMU/LegT in benchmarks targeting long-range dependency (Gu et al., 2020). HiPPO-LegS matrices also play a crucial foundational role in state space models (notably S4), where they serve as initialization for the state matrix, enabling effective modeling of long sequences (Gu et al., 2022).

7. Implementation and Empirical Performance

HiPPO-LegS matrices can be precomputed for fixed $N$ . The memory update at each timestep is performed as:

1	m = (I - A_legS / k) @ m + (B_legS / k) * x_next

or with optimized prefix-sum kernels for the lower-triangular matvec. These updates yield RNN memory cells that optimally track the Legendre coefficients of the ongoing input history (Gu et al., 2020).

Empirically, HiPPO-LegS achieves state-of-the-art results on tasks such as permuted MNIST (98.3% accuracy) and trajectory classification, with substantial margins over RNN and neural ODE baselines, especially in regimes requiring timescale robustness or with missing data (Gu et al., 2020).

Table 1: HiPPO-LegS and LMU Matrix Structures

Operator	Measure/Window	$A_{n,k}$ Pattern (closed form)
HiPPO-LegS	Uniform on $[0, t]$	Lower triangular, $(2n+1)(2k+1)$ , $n\geq k$
LMU (LegT)	Uniform on $[t-\theta,t]$	Alternating sign, explicit $\theta$ dependence

For further technical details and proofs, see (Gu et al., 2020, Gu et al., 2022), and (Park et al., 2024).

Markdown Report Issue Upgrade to Chat

References (3)

HiPPO: Recurrent Memory with Optimal Polynomial Projections (2020)

How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections (2022)

Numerical Analysis of HiPPO-LegS ODE for Deep State Space Models (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HiPPO-LegS Matrix.