Papers
Topics
Authors
Recent
Search
2000 character limit reached

Linear Complexity Models Overview

Updated 3 February 2026
  • Linear complexity models are defined by operations that scale linearly with parameters such as input size, data points, or dimensionality.
  • They enable efficient computation in optimization, learning, and cryptography by providing predictable, scalable performance.
  • Empirical studies demonstrate that these models achieve competitive results in tasks like self-attention, speech processing, and network coding.

A linear complexity model is any computational or statistical construct whose fundamental operations, inference steps, or capacity to represent functions scale linearly with a relevant problem parameter—typically input size, dimensionality, number of data points, or output channels. Linear complexity models are both a central concept in numerical optimization, learning theory, high-dimensional statistics, cryptography, and large-scale machine learning, and the target of extensive research due to the strong intersection between scalability and quantitative theoretical guarantees.

1. Formal Definition and Scope

A model, algorithm, or coding scheme is termed “linear complexity” if its time, space, or sample complexities are O(N)O(N) in the key parameter NN. This includes:

  • Optimization models: Linear interpolation or approximation schemes where work per iteration or per function evaluation is proportional to dimension or sample number (Schwertner et al., 2022).
  • Neural sequence models: Architectures with O(T)O(T) complexity in sequence length TT, circumventing the quadratic bottleneck of conventional attention (Wang et al., 2020, Liao et al., 2024, Zhang et al., 2024, Shen et al., 2024).
  • Statistical models: Function classes or estimators (such as linear predictors for vector-valued regression or ERM) with sample complexity scaling as O(k/ϵ2)O(k/\epsilon^2) in output dimension kk (Schliserman et al., 2024).
  • Data structures and codes: Network coding or error-correcting codes whose encoding/decoding cost is O(n)O(n) in data block length (Mahdaviani et al., 2013).
  • Signal models: Cryptographic or time-series models whose parameterization or unpredictability is measured by the minimal order of linear recurrences (the “linear complexity profile”) (Zhou, 2011, Mérai et al., 2016, Gómez-Pérez et al., 2018).

This definition includes not only literal algorithms, but also combinatorial or algebraic constructs where “linear complexity” measures intrinsic representational or computational requirements.

2. Linear Complexity in Optimization and Learning

2.1. Derivative-Free Optimization

Linear interpolation models in trust-region methods are a principal example. Given black-box objective f:RnRf : \mathbb{R}^n \to \mathbb{R}, the linear surrogate m(x)m(x), built from n+1n+1 function values, provides a “fully linear” approximation within a ball of radius δ\delta if

f(x)m(x)Cgδ,f(x)m(x)Cfδ2\Vert \nabla f(x) - \nabla m(x) \Vert \leq C_g \delta, \quad |f(x) - m(x)| \leq C_f \delta^2

where Cg=L+(12L+2κ)ΛnC_g = L + (\tfrac12 L + 2\kappa)\Lambda n, Cf=12L+κ+(12L+2κ)ΛnC_f = \tfrac12 L + \kappa + (\tfrac12 L + 2\kappa)\Lambda n, with LL a Lipschitz constant, Λ\Lambda the geometry (“poisedness”) of the interpolation set, and κ\kappa the model inexactness. The overall iteration and evaluation complexity is O(nϵ2)O(n\epsilon^{-2}) to reach f<ϵ\Vert \nabla f \Vert_\infty < \epsilon, demonstrating explicit linear scaling in problem dimension (Schwertner et al., 2022).

2.2. Empirical Risk Minimization and Vector-Valued Prediction

Learning a linear predictor WRk×mW \in \mathbb{R}^{k \times m} on data (x,y)(x,y) with xRmx \in \mathbb{R}^m, yRky \in \mathbb{R}^k and convex, Lipschitz loss exhibits sample complexity

n=Θ(kϵ2)n = \Theta\left(\frac{k}{\epsilon^2}\right)

matching upper/lower bounds for ERM with Frobenius-norm-constrained weights. For k=1k=1, this reduces to classical rates; for kdk \approx d it interpolates to the cost of general dd-dimensional stochastic convex optimization (Schliserman et al., 2024).

2.3. Contextual Markov Decision Processes

In linear CMDP models, value-approximation and policy learning admit sample complexities N=O~(H4d3/ϵ2)N = \widetilde{O}(H^4 d^3 / \epsilon^2) or N=O~(H4d3K/ϵ2)N = \widetilde{O}(H^4 d^3 K / \epsilon^2), scaling polynomially in horizon HH, feature dim dd, and linearly in the cardinality of controllable factors such as action space KK (Deng et al., 2024). The distinction between model classes here informs sample efficiency benchmarks in RL theory.

3. Linear Complexity in Neural and Sequence Models

3.1. Self-Attention and Linear Alternatives

Standard self-attention mechanisms are quadratic, with O(T2d)O(T^2 d) complexity for TT tokens, dd features. Linear-complexity models replace this with:

  • Low-rank projections (Linformer): Projecting keys/values to k=O(d)k = O(d), resulting in O(n)O(n) time/memory (Wang et al., 2020).
  • Stateful recurrent mechanisms (ViG Gated Linear Attention, HGRN2): Causal or bidirectional state scans with data-dependent gating; O(Td2)O(Td^2) time, O(Td)O(Td) memory (Liao et al., 2024, Shen et al., 2024).
  • MLP-based local-global fusion (SummaryMixing): Sequence processed via local MLPs and a single global summary vector, O(T)O(T) complexity, used in speech SSL (Zhang et al., 2024).

Empirical studies show these models not only achieve theoretical linear scaling but also competitive or superior performance and resource utilization on large-scale language and vision tasks. For instance, scaling laws for linear-complexity LLMs (TNL, HGRN2, cosFormer2) closely track or outperform LLaMA in loss-vs-compute curves, with optimal data/model allocations demonstrating similar exponents (Shen et al., 2024).

3.2. Application in Speech and Vision

  • Speech SSL: Replacing MHSA with SummaryMixing in Conformer encoders results in 18% faster pretraining, 23% lower VRAM, and equivalent or improved downstream accuracy (Zhang et al., 2024).
  • High-resolution vision: Gated linear attention with 2D injection achieves 4.8×\times GPU speedup and 90% lower memory vs. DeiT-T at 1024×10241024\times1024 resolution while matching or surpassing accuracy (Liao et al., 2024).

4. Linear Complexity in Coding Theory, Cryptography, and Sequence Theory

4.1. Pseudorandomness and Linear-Complexity Profiles

The linear complexity of a binary or finite-field sequence of period NN is the minimal order of a linear recurrence (LFSR length) that generates it, L(S)L(S). The kk-error linear complexity Lk(S)L_k(S) is the minimal LL obtainable after at most kk symbol changes per period. Stable kk-error linear complexity—where all error patterns of weight k\leq k preserve L(S)L(S)—is guaranteed via cube theory and is critical for cryptographic keystream resistance (Zhou, 2011, Mérai et al., 2016).

For 2n2^n-periodic binary sequences, the maximal kk-error linear complexity is 2n(2l1)2^n - (2^l - 1) for 2l1k<2l2^{l-1} \leq k < 2^l; explicit “cube” constructions achieve this bound. Multidimensional extensions—where sequences are indexed over Zd\mathbb{Z}^d—define linear complexity via the dimension of the quotient Fq[X1,,Xd]/I(s)\mathbb{F}_q[X_1,\ldots,X_d]/I(s), generalizing the 1D theory and supporting cryptographic array constructions (Gómez-Pérez et al., 2018).

4.2. Expansion Complexity

Expansion complexity measures the minimal total degree of a nontrivial algebraic relation h(x,G(x))0modxNh(x,G(x)) \equiv 0 \mod x^N for sequence generating series G(x)G(x). For strongly random sequences, expansion complexity grows as N\sqrt{N} almost surely, while for periodic sequences it aligns with linear complexity for long enough segments. This parameter serves as a stricter unpredictability test in cryptographic applications (Mérai et al., 2016).

5. Linear Complexity in Random Coding and Variational Inference

5.1. Network Codes

Sparse random linear network coding (SRLNC) and “Gamma” network codes achieve encoding/decoding cost O(N)O(N) in blocklength, with reception overheads minimized to 2–7% via density-evolution–optimized outer codes. The design is characterized by degree distributions, generation size, and the solution of fixed-point DE equations. These codes surpass previous schemes in both linearity and practical overhead (Mahdaviani et al., 2013).

5.2. Gaussian Processes

Decoupling mean and covariance representations in the RKHS enables variational inference for Gaussian processes with time and space cost only linear in the number of mean basis points (typically Mα104M_\alpha \sim 10^4) and cubic in the significantly smaller covariance basis size (MβMαM_\beta \ll M_\alpha). This enables expressive, large-scale GP models otherwise precluded by O(M3)O(M^3) scaling (Cheng et al., 2017).

6. Interpretability, Representational Linear Models, and Theoretical Implications

6.1. Simplicity in Linear Complexity Visual Models

Empirical work reveals that visual complexity judgments by humans are close to a linear function of two quantities extracted from deep segmentation models: the number of visual segments (“blobs”) and the number of semantic object classes. A linear regression on the square roots of these counts captures 60–80% of the variance in mean complexity ratings across diverse datasets, outperforming complex handcrafted or deep learning baselines (Shen et al., 2024).

6.2. Pseudo-Boolean Linearization Complexity

For combinatorial or integer programming, the linearization complexity of a pseudo-Boolean function is the minimal number kk such that ff can be expressed as a linear combination of kk auxiliary Boolean functions. For random polynomials, this is almost always maximal (2nn12^n - n - 1 with nn variables). Practical linear IP formulations, such as those for the low-autocorrelation sequence problem, exploit this structure to reduce auxiliary variable count from O(N3)O(N^3) (monomial-based) to O(N2)O(N^2) (value-indicator-based), with substantial computational benefits (Walter, 2023).

7. Theoretical Nuances, Trade-offs, and Outlook

Linear complexity models provide scalable and interpretable tools across statistical learning, coding, sequence modeling, and optimization. However, the precise meaning of “linear” (with respect to which parameter), the effect of hidden constants (e.g., geometry, model-poisedness, dimension), and the role of model inexactness are all critical in practical deployments (Schwertner et al., 2022). Trade-offs frequently arise in approximation error versus computation, regularization, or expressivity—as with the choice of projection rank in Linformer, decay types in linear transformers, or block size in network codes. Theoretical guarantees are matched by recent empirical evidence that linear-complexity approaches can achieve parity or even superiority with superlinear baselines on practical, large-scale tasks (Shen et al., 2024, Liao et al., 2024, Zhang et al., 2024).

References

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Linear Complexity Models.