Papers
Topics
Authors
Recent
Search
2000 character limit reached

L2 Adaptive Computation (LAC)

Updated 6 February 2026
  • L2 Adaptive Computation (LAC) is a parameter-free method that identifies and skips redundant processing in transformer models using token-wise L2-norm analysis.
  • In quantum computation, LAC adapts measurement settings via linear feedback to efficiently implement symmetric Boolean functions with reduced space-time resources.
  • Empirical evaluations in language models show that LAC dynamically optimizes layer activations without retraining, enhancing both efficiency and interpretability.

L2 Adaptive Computation (LAC) encompasses a family of parameter-free adaptive strategies for identifying relevant computation on-the-fly in both classical neural and quantum computational models. Principally, LAC refers to (1) a norm-based criterion for layer activity in pre-trained neural networks, and (2) adaptive measurement protocols for quantum computation that utilize linear feedback over F2\mathbb{F}_2. Both settings exploit dynamic, data-dependent progress detection to reduce computational resources, enhance interpretability, or reveal intrinsic sparsity patterns without retraining or parameter adjustments, as detailed in the literature on transformer LLMs and measurement-based quantum algorithms (Shemiranifar, 20 May 2025, Daniel et al., 2022).

1. Formal Definition in Classical and Quantum Settings

In transformer-based LLMs, L2 Adaptive Computation monitors token-wise L2-norm progress across network layers. Given a stack of TT layers, with hidden states hlRB×L×Dh_l \in \mathbb{R}^{B \times L \times D}, the per-token L2-norm at each layer is

hl2(i,j)=k=1D(hl(i,j,k))2,\|h_l\|_2^{(i,j)} = \sqrt{\sum_{k=1}^D (h_l^{(i,j,k)})^2},

and the per-layer progress is

δl(i,j)=hl2(i,j)hl12(i,j).\delta_l^{(i,j)} = \|h_l\|_2^{(i,j)} - \|h_{l-1}\|_2^{(i,j)}.

A dynamic threshold λl(i,j)=α[max(Δl(i,j))min(Δl(i,j))]\lambda_l^{(i,j)} = \alpha \cdot [\max(\Delta_l^{(i,j)}) - \min(\Delta_l^{(i,j)})] (α(0,1]\alpha \in (0,1]) determines whether the layer is void (inactive) for a token—if δl(i,j)<λl(i,j)\delta_l^{(i,j)} < \lambda_l^{(i,j)}, the layer is considered a void for that token (Shemiranifar, 20 May 2025).

In measurement-based quantum computation, L2 Adaptive Computation is formalized as adaptive measurement-based quantum computation (L2-MBQC). An L2-MBQC instance is specified by (C,M,f)(|C\rangle, M, f), where C|C\rangle is a cluster state of LQL_Q qubits, MM is an adaptive measurement protocol with settings si=(Px)i(Am)is_i = (P \cdot x)_i \oplus (A \cdot m)_i, and ff is the function extracted as the output parity of a subset of measurement outcomes. Adaptivity permits dynamic adjustment of measurement bases depending on earlier outcomes, yielding exponentially improved space-time resource efficiency for certain Boolean functions (Daniel et al., 2022).

2. Adaptive Computation Procedures

In transformers, the LAC algorithm operates in inference mode without retraining:

  • For each token and each layer, compute the L2-norm progress δl\delta_l.
  • Accumulate all progress values up to layer ll.
  • Calculate the dynamic threshold λl\lambda_l using the current range of progress values and α\alpha.
  • If δl<λl\delta_l < \lambda_l, mark the layer as void for the token and mask future computation for that token.
  • This is implemented in two phases: Prompt Processing (PP) for input tokens, and Response Generation (RG) for tokens generated autoregressively.

L2-MBQC proceeds as follows:

  • Prepare a cluster state C|C\rangle using a depth-3 circuit of CZ and H gates.
  • For each qubit ii, adaptively set the measurement basis via sis_i as a function of input xx and previous outcomes m<im_{<i} using strictly lower-triangular AA.
  • Measure each qubit in a Pauli-XY basis determined by sis_i and preassigned angle maps θi\theta_i.
  • Output is the parity of designated measured outcomes.

This approach enables resource-adaptive computation both for classical model pruning and for quantum algorithmic space-time reduction (Shemiranifar, 20 May 2025, Daniel et al., 2022).

3. Phase-Specific and Structural Effects

In transformer inference with LAC, two operational phases are distinguished:

  • Prompt Processing (PP): Layer activity is traced per input token, revealing which layers encode context.
  • Response Generation (RG): Activity is monitored for each autoregressively generated token; typically, different layers are active compared to PP.

Experimental evidence shows that layers marked as void during PP may be active during RG and vice versa, with mean L2-norms and delta-norms differing systematically between phases. For instance, middle layers in Qwen2.5-7B-Instruct are void more than 80% of the time during both phases, while early and final layers remain active (Shemiranifar, 20 May 2025).

In quantum computation, adaptivity enables the simulation of symmetric Boolean functions such as Modp,0(x)\mathrm{Mod}_{p,0}(x) in constant depth with linear qubit resources, in contrast to the exponential requirements in non-adaptive settings (Daniel et al., 2022).

4. Theoretical Rationale and Resource Efficiency

The defining L2-norm criterion in classical LAC is motivated by the observation that negligible norm changes across a layer for a given token indicate lack of substantial transformation; thus, such layers can be masked without affecting task-relevant processing. The adaptive threshold ensures this criterion is token- and context-sensitive.

In quantum L2-MBQC, the critical insight is that linear feedback of measurement outcomes allows the implementation of complex Boolean functions in constant depth and linear space by adaptive updating of measurement bases. This results in exponential reductions in space-time complexity compared to non-adaptive MBQC (e.g., for Modp,0_{p,0} functions, moving from 2n12^n-1 to O(np)O(np) qubits) (Daniel et al., 2022).

The oracular separation theorem further demonstrates that adaptive L2 quantum computation can solve problems outside the constant-depth classical circuit class with unbounded fan-in gates, establishing a strict separation between QNC0[2]QNC^0[2] and AC0[q]AC^0[q] for q2q \ne 2.

5. Empirical Results in LLMs

LAC has been empirically evaluated on instruction-tuned transformer models, including Llama3-8B-Instruct, Mistral-7B-Instruct-v0.3, and Qwen2.5-7B-Instruct, across benchmarks such as MMLU, GPQA Diamond, and BoolQ. Using α=0.8\alpha=0.8:

Model MMLU (NS→SK) GPQA (NS→SK) BoolQ (NS→SK) % Layers (PP / RG)
Llama3-8B-Instruct 61.18→60.42 29.11→30.53 76.38→75.92 53/43, 62/65, 64/72
Qwen2.5-7B-Instruct 69.24→71.29 34.78→33.33 86.40→83.81 29/30, 32/31, 36/36
Mistral-7B-Instruct-v0.3 59.70→59.29 13.88→18.36 84.98→83.18 71/72, 74/74, 71/66

NS = Not Skipped, SK = Skipped; % Layers = average fraction of layers active in (PP / RG) phases.

Significant outcomes include:

  • Qwen2.5-7B-Instruct achieves a +2.05 pp MMLU improvement while using \sim30% of layers.
  • Mistral-7B-Instruct-v0.3 gains +4.48 pp on GPQA at \sim74% layer usage.
  • In some configurations, especially for Qwen2.5-7B, aggressive skipping (α=1.0\alpha=1.0, \sim20% usage) increases accuracy.
  • Middle layers disproportionately contribute voids; mean L2-norm changes at these layers are minimal.

These results suggest that not all layers are critical for all tokens or tasks, and that using LAC to dynamically skip void layers can improve efficiency and, in certain scenarios, even accuracy (Shemiranifar, 20 May 2025).

6. Implementation Considerations and Limitations

While LAC identifies and masks void layers at inference, current implementations do not yield direct runtime speedup, as all layers are executed to compute L2 progress; only the logical usage is reduced. Hardware-aware implementations would be required to realize performance gains by skipping physical computations in voided layers.

The selection of α\alpha is crucial: overly aggressive layer skipping (α1\alpha \to 1) may degrade task performance for some models and benchmarks. LAC's effectiveness assumes that L2-norm change is a valid proxy for computation relevance, a condition empirically verified for certain tasks but not universally established (e.g., tasks involving dense predictions may deviate).

In the quantum context, L2-MBQC is limited to protocols where classical feedback can be implemented as linear maps over F2\mathbb{F}_2, and its resource savings are primarily realized for symmetric or periodic Boolean functions. Non-adaptive protocols remain exponentially more expensive for these tasks (Daniel et al., 2022).

7. Broader Significance and Relations

L2 Adaptive Computation provides a parameter-free mechanism for probing and optimizing computation both in neural and quantum models. In transformers, it facilitates fine-grained analysis of task- and phase-specific computational pathways, offering insights into redundancy and specialization in deep networks. In quantum computing, adaptivity bridges the gap between abstract circuit models and physically feasible, scalable computation for nontrivial Boolean functions. Theoretical results establish a complexity separation for adaptivity-enabled protocols versus classical constant-depth circuits.

A plausible implication is that LAC methodologies can inform both structured model compression and interpretability in deep learning, and foundational studies of quantum-classical computational boundaries. The approach refrains from introducing new parameters or requiring retraining, instead relying on intrinsic activation statistics and measurement outcomes to guide adaptive execution (Shemiranifar, 20 May 2025, Daniel et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to L2 Adaptive Computation (LAC).