L-SPADE Linear Proxy Framework
- L-SPADE Linear Proxy is a method that utilizes sparse linear inequalities to approximate high-dimensional PSD cones in convex optimization effectively.
- It employs a separation SDP to generate E-PSD cuts, ensuring the LP relaxation matches tight SDP bounds while reducing computational complexity by up to 100×.
- In neural decoding, L-SPADE enables early-exit strategies in transformers via learned linear maps, accelerating inference without compromising accuracy.
The term L-SPADE Linear Proxy refers to a methodology for efficiently projecting, approximating, or substituting high-complexity or high-dimensional mathematical objects by linear or polyhedral proxies using sparse or low-cost transformations. Contemporary instantiations of "L-SPADE linear proxy" span several fields, including convex optimization (particularly semidefinite programming relaxations of nonconvex quadratic problems) and machine learning (notably, transformer-based LLMs). Across these domains, the L-SPADE framework employs sparsity, linearity, and projection to achieve substantial reductions in computational cost without sacrificing representational or bound tightness. The concept is most prominently developed in recent works by Günlük et al. on positive semidefinite (PSD) cones (Günlük et al., 10 Mar 2026), as well as in efficient decoding and early-exit policies for large neural networks (Zheng et al., 23 Jul 2025).
1. Linear Proxy for the Positive Semidefinite Cone
The L-SPADE linear proxy in convex optimization, especially for semidefinite programming (SDP) relaxations of quadratically-constrained quadratic programs (QCQPs), is based on the construction of sparse linear inequalities—termed "E-PSD cuts"—that approximate or outer bound the PSD constraint. For an -variable QCQP, the standard Shor relaxation lifts the problem to a matrix variable with a PSD constraint and additional affine constraints. This renders the solution of large SDPs burdensome, especially in branch-and-bound frameworks.
The key insight is that if the problem data are sparse, only a subset of the indices in need be enforced. Let and define as the subspace of symmetric matrices with support in . The L-SPADE proxy constructs an LP relaxation over augmented by the family of E-PSD cuts: linear inequalities 0 for all 1. The main theorem establishes that the projection of the full PSD cone onto 2, enforced via E-PSD cuts, makes the LP relaxation as tight as the original SDP:
3
provided all constraints and objective coefficients are supported on 4 (Günlük et al., 10 Mar 2026).
2. Sparse-Cut Generation and Separation SDP
To operationalize the L-SPADE proxy, separation oracles are used to identify violated E-PSD cuts. At a candidate 5, one solves the projection SDP:
6
If the optimal value is negative, the minimizer 7 yields a valid sparse cut 8. In doubly-nonnegative relaxations (when 9), the cone is further tightened by 0 for 1. This separation subproblem is itself a small SDP, but is tractable due to the limited support 2.
Practically, the algorithm proceeds iteratively: solve the root-node LP with accumulated sparse cuts, check for separation at LP solutions, and repeat until no violation is found or a maximum number of iterations is reached. In branch-and-bound (B&B), the LP proxy is re-used at every node, and cut separation can be optionally performed in selected nodes (Günlük et al., 10 Mar 2026).
| Step | Input/Output | Operation |
|---|---|---|
| 1 | QCQP data 3 | Compute 4; initialize cut set 5 |
| 2 | LP+E-PSD cuts; candidate 6 | Separation via projection SDP; add new cut if violated |
| 3 | B&B node | Solve LP over 7 with all sparse cuts in 8 |
3. Theoretical and Empirical Performance
Theoretically, the L-SPADE linear proxy delivers the exact relaxation bound obtainable from the full SDP when all problem structure is captured in 9. This provides a substantial reduction in computational effort: the LP relaxation over variables in 0 plus a few hundred sparse cuts can be solved 1–2 faster than dense-cut approaches.
Empirical studies on BoxQCQP and QPLIB benchmarks confirm the method's efficiency: in BoxQCQPs, sparse cuts at the LP root node closed at least 3 of the SDP gap, and often matched the SDP bound exactly for QPLIB instances. Overall B&B speedup ranged from 4 to 5 on difficult, high-dimensional quadratic instances, with dramatic reductions in node counts (Günlük et al., 10 Mar 2026).
4. Linear Proxy in Early-Exit Neural Decoding
The L-SPADE linear proxy has also been adapted as a linear approximation to the SPADE (Space Alignment Decoding) mechanism in LLMs. In this context, the goal is to enable early-exit at intermediate transformer layers by estimating the output-layer token distribution from intermediate representations via a learned linear map.
Let 6 be the hidden state of token 7 at layer 8 and 9 the output embedding matrix. The L-SPADE proxy is a matrix 0 such that
1
where 2 is a proxy for the final-layer hidden state, yielding output logits and token probabilities without full sequential decoding (Zheng et al., 23 Jul 2025).
3 is trained via cross-entropy distillation to match the SPADE output logits, minimizing
4
across all training tokens.
5. Entropy-Based Early-Exit Policies
L-SPADE enables entropy-based early-exit in transformer inference. At each evaluation interval, the proxy distribution 5 is computed from the current hidden state, and its Shannon entropy
6
is evaluated. If 7 falls below a threshold 8, indicating confident prediction, inference truncates and the final answer is produced via a lightweight SPADE forward on the minimal two-token sequence. This mechanism reduces inference complexity from 9 (for full-sequence, all-layer decoding) to a single matrix multiply and softmax per check, plus the truncated SPADE pass.
| Layer 0 | Proxy logits 1 | Entropy 2 | Decision |
|---|---|---|---|
| Example | 3 | 4 nats (with 5) | Stop and decode via SPADE |
6. Cross-Domain Significance and Comparative Summary
Across both optimization and machine learning, the L-SPADE linear proxy formalizes the substitution of high-cost, high-dimensional constraints (PSD-cone, deep feature transformation) with computationally efficient, sparse, or low-rank linear surrogates. In optimization, this leads to scalable LP relaxations matching SDP bounds, highly advantageous for large-scale branch-and-bound. In machine learning, it enables efficient layer-wise proxy decoding while maintaining predictive fidelity, supporting hybrid early-exit strategies with tight accuracy guarantees (Günlük et al., 10 Mar 2026, Zheng et al., 23 Jul 2025).
Both lines of work demonstrate that careful structural projection—onto a sparsity pattern for SDPs, or via learned linear maps for neural representations—can preserve essential problem structure without incurring prohibitive resource costs. This suggests further potential of L-SPADE proxies in high-dimensional scientific computing where constraint enforcement or model-depth bottlenecks dominate.