Linear Complexity Models Overview

Updated 3 February 2026

Linear complexity models are defined by operations that scale linearly with parameters such as input size, data points, or dimensionality.
They enable efficient computation in optimization, learning, and cryptography by providing predictable, scalable performance.
Empirical studies demonstrate that these models achieve competitive results in tasks like self-attention, speech processing, and network coding.

A linear complexity model is any computational or statistical construct whose fundamental operations, inference steps, or capacity to represent functions scale linearly with a relevant problem parameter—typically input size, dimensionality, number of data points, or output channels. Linear complexity models are both a central concept in numerical optimization, learning theory, high-dimensional statistics, cryptography, and large-scale machine learning, and the target of extensive research due to the strong intersection between scalability and quantitative theoretical guarantees.

1. Formal Definition and Scope

A model, algorithm, or coding scheme is termed “linear complexity” if its time, space, or sample complexities are $O(N)$ in the key parameter $N$ . This includes:

Optimization models: Linear interpolation or approximation schemes where work per iteration or per function evaluation is proportional to dimension or sample number (Schwertner et al., 2022).
Neural sequence models: Architectures with $O(T)$ complexity in sequence length $T$ , circumventing the quadratic bottleneck of conventional attention (Wang et al., 2020, Liao et al., 2024, Zhang et al., 2024, Shen et al., 2024).
Statistical models: Function classes or estimators (such as linear predictors for vector-valued regression or ERM) with sample complexity scaling as $O(k/\epsilon^2)$ in output dimension $k$ (Schliserman et al., 2024).
Data structures and codes: Network coding or error-correcting codes whose encoding/decoding cost is $O(n)$ in data block length (Mahdaviani et al., 2013).
Signal models: Cryptographic or time-series models whose parameterization or unpredictability is measured by the minimal order of linear recurrences (the “linear complexity profile”) (Zhou, 2011, Mérai et al., 2016, Gómez-Pérez et al., 2018).

This definition includes not only literal algorithms, but also combinatorial or algebraic constructs where “linear complexity” measures intrinsic representational or computational requirements.

2. Linear Complexity in Optimization and Learning

2.1. Derivative-Free Optimization

Linear interpolation models in trust-region methods are a principal example. Given black-box objective $f : \mathbb{R}^n \to \mathbb{R}$ , the linear surrogate $m(x)$ , built from $n+1$ function values, provides a “fully linear” approximation within a ball of radius $\delta$ if

$\Vert \nabla f(x) - \nabla m(x) \Vert \leq C_g \delta, \quad |f(x) - m(x)| \leq C_f \delta^2$

where $C_g = L + (\tfrac12 L + 2\kappa)\Lambda n$ , $C_f = \tfrac12 L + \kappa + (\tfrac12 L + 2\kappa)\Lambda n$ , with $L$ a Lipschitz constant, $\Lambda$ the geometry (“poisedness”) of the interpolation set, and $\kappa$ the model inexactness. The overall iteration and evaluation complexity is $O(n\epsilon^{-2})$ to reach $\Vert \nabla f \Vert_\infty < \epsilon$ , demonstrating explicit linear scaling in problem dimension (Schwertner et al., 2022).

2.2. Empirical Risk Minimization and Vector-Valued Prediction

Learning a linear predictor $W \in \mathbb{R}^{k \times m}$ on data $(x,y)$ with $x \in \mathbb{R}^m$ , $y \in \mathbb{R}^k$ and convex, Lipschitz loss exhibits sample complexity

$n = \Theta\left(\frac{k}{\epsilon^2}\right)$

matching upper/lower bounds for ERM with Frobenius-norm-constrained weights. For $k=1$ , this reduces to classical rates; for $k \approx d$ it interpolates to the cost of general $d$ -dimensional stochastic convex optimization (Schliserman et al., 2024).

2.3. Contextual Markov Decision Processes

In linear CMDP models, value-approximation and policy learning admit sample complexities $N = \widetilde{O}(H^4 d^3 / \epsilon^2)$ or $N = \widetilde{O}(H^4 d^3 K / \epsilon^2)$ , scaling polynomially in horizon $H$ , feature dim $d$ , and linearly in the cardinality of controllable factors such as action space $K$ (Deng et al., 2024). The distinction between model classes here informs sample efficiency benchmarks in RL theory.

3. Linear Complexity in Neural and Sequence Models

3.1. Self-Attention and Linear Alternatives

Standard self-attention mechanisms are quadratic, with $O(T^2 d)$ complexity for $T$ tokens, $d$ features. Linear-complexity models replace this with:

Low-rank projections (Linformer): Projecting keys/values to $k = O(d)$ , resulting in $O(n)$ time/memory (Wang et al., 2020).
Stateful recurrent mechanisms (ViG Gated Linear Attention, HGRN2): Causal or bidirectional state scans with data-dependent gating; $O(Td^2)$ time, $O(Td)$ memory (Liao et al., 2024, Shen et al., 2024).
MLP-based local-global fusion (SummaryMixing): Sequence processed via local MLPs and a single global summary vector, $O(T)$ complexity, used in speech SSL (Zhang et al., 2024).

Empirical studies show these models not only achieve theoretical linear scaling but also competitive or superior performance and resource utilization on large-scale language and vision tasks. For instance, scaling laws for linear-complexity LLMs (TNL, HGRN2, cosFormer2) closely track or outperform LLaMA in loss-vs-compute curves, with optimal data/model allocations demonstrating similar exponents (Shen et al., 2024).

3.2. Application in Speech and Vision

Speech SSL: Replacing MHSA with SummaryMixing in Conformer encoders results in 18% faster pretraining, 23% lower VRAM, and equivalent or improved downstream accuracy (Zhang et al., 2024).
High-resolution vision: Gated linear attention with 2D injection achieves 4.8 $\times$ GPU speedup and 90% lower memory vs. DeiT-T at $1024\times1024$ resolution while matching or surpassing accuracy (Liao et al., 2024).

4. Linear Complexity in Coding Theory, Cryptography, and Sequence Theory

4.1. Pseudorandomness and Linear-Complexity Profiles

The linear complexity of a binary or finite-field sequence of period $N$ is the minimal order of a linear recurrence (LFSR length) that generates it, $L(S)$ . The $k$ -error linear complexity $L_k(S)$ is the minimal $L$ obtainable after at most $k$ symbol changes per period. Stable $k$ -error linear complexity—where all error patterns of weight $\leq k$ preserve $L(S)$ —is guaranteed via cube theory and is critical for cryptographic keystream resistance (Zhou, 2011, Mérai et al., 2016).

For $2^n$ -periodic binary sequences, the maximal $k$ -error linear complexity is $2^n - (2^l - 1)$ for $2^{l-1} \leq k < 2^l$ ; explicit “cube” constructions achieve this bound. Multidimensional extensions—where sequences are indexed over $\mathbb{Z}^d$ —define linear complexity via the dimension of the quotient $\mathbb{F}_q[X_1,\ldots,X_d]/I(s)$ , generalizing the 1D theory and supporting cryptographic array constructions (Gómez-Pérez et al., 2018).

4.2. Expansion Complexity

Expansion complexity measures the minimal total degree of a nontrivial algebraic relation $h(x,G(x)) \equiv 0 \mod x^N$ for sequence generating series $G(x)$ . For strongly random sequences, expansion complexity grows as $\sqrt{N}$ almost surely, while for periodic sequences it aligns with linear complexity for long enough segments. This parameter serves as a stricter unpredictability test in cryptographic applications (Mérai et al., 2016).

5. Linear Complexity in Random Coding and Variational Inference

5.1. Network Codes

Sparse random linear network coding (SRLNC) and “Gamma” network codes achieve encoding/decoding cost $O(N)$ in blocklength, with reception overheads minimized to 2–7% via density-evolution–optimized outer codes. The design is characterized by degree distributions, generation size, and the solution of fixed-point DE equations. These codes surpass previous schemes in both linearity and practical overhead (Mahdaviani et al., 2013).

5.2. Gaussian Processes

Decoupling mean and covariance representations in the RKHS enables variational inference for Gaussian processes with time and space cost only linear in the number of mean basis points (typically $M_\alpha \sim 10^4$ ) and cubic in the significantly smaller covariance basis size ( $M_\beta \ll M_\alpha$ ). This enables expressive, large-scale GP models otherwise precluded by $O(M^3)$ scaling (Cheng et al., 2017).

6. Interpretability, Representational Linear Models, and Theoretical Implications

6.1. Simplicity in Linear Complexity Visual Models

Empirical work reveals that visual complexity judgments by humans are close to a linear function of two quantities extracted from deep segmentation models: the number of visual segments (“blobs”) and the number of semantic object classes. A linear regression on the square roots of these counts captures 60–80% of the variance in mean complexity ratings across diverse datasets, outperforming complex handcrafted or deep learning baselines (Shen et al., 2024).

6.2. Pseudo-Boolean Linearization Complexity

For combinatorial or integer programming, the linearization complexity of a pseudo-Boolean function is the minimal number $k$ such that $f$ can be expressed as a linear combination of $k$ auxiliary Boolean functions. For random polynomials, this is almost always maximal ( $2^n - n - 1$ with $n$ variables). Practical linear IP formulations, such as those for the low-autocorrelation sequence problem, exploit this structure to reduce auxiliary variable count from $O(N^3)$ (monomial-based) to $O(N^2)$ (value-indicator-based), with substantial computational benefits (Walter, 2023).

7. Theoretical Nuances, Trade-offs, and Outlook

Linear complexity models provide scalable and interpretable tools across statistical learning, coding, sequence modeling, and optimization. However, the precise meaning of “linear” (with respect to which parameter), the effect of hidden constants (e.g., geometry, model-poisedness, dimension), and the role of model inexactness are all critical in practical deployments (Schwertner et al., 2022). Trade-offs frequently arise in approximation error versus computation, regularization, or expressivity—as with the choice of projection rank in Linformer, decay types in linear transformers, or block size in network codes. Theoretical guarantees are matched by recent empirical evidence that linear-complexity approaches can achieve parity or even superiority with superlinear baselines on practical, large-scale tasks (Shen et al., 2024, Liao et al., 2024, Zhang et al., 2024).

References

(Schwertner et al., 2022) On complexity constants of linear and quadratic models for derivative-free trust-region algorithms
(Schliserman et al., 2024) Complexity of Vector-valued Prediction: From Linear Models to Stochastic Convex Optimization
(Deng et al., 2024) Sample Complexity Characterization for Linear Contextual MDPs
(Shen et al., 2024) Scaling Laws for Linear Complexity LLMs
(Zhang et al., 2024) Linear-Complexity Self-Supervised Learning for Speech Processing
(Wang et al., 2020) Linformer: Self-Attention with Linear Complexity
(Liao et al., 2024) ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
(Zhou, 2011) Periodic sequences with stable $k$ -error linear complexity
(Mérai et al., 2016) Expansion complexity and linear complexity of sequences over finite fields
(Gómez-Pérez et al., 2018) On the linear complexity for multidimensional sequences
(Mahdaviani et al., 2013) Linear-Complexity Overhead-Optimized Random Linear Network Codes
(Cheng et al., 2017) Variational Inference for Gaussian Process Models with Linear Complexity
(Walter, 2023) The Binary Linearization Complexity of Pseudo-Boolean Functions
(Shen et al., 2024) Simplicity in Complexity: Explaining Visual Complexity using Deep Segmentation Models