Frozen Projection Justification

Updated 13 May 2026

Frozen projection is a strategy where fixed precomputed transforms are combined with trainable components to preserve stability and enforce structure in models.
It underpins methodologies in continual learning, model composition, and convex optimization by providing both empirical benefits and theoretical guarantees.
Applications include enhancing representation learning through hypersphere projections, scalable statistical sketching, and improved convergence in deep neural networks.

A frozen projection is a methodological principle and analytical tool in modern machine learning and applied mathematics by which a transformation—typically linear or low-capacity nonlinear—operates in conjunction with a fixed (non-trainable or non-adaptive) subcomponent whose parameters are not updated during a downstream optimization or fine-tuning stage. This concept appears in a range of domains including visual/language representation learning, continual learning, large model composition, random-sketching for scalable statistics, point process design, and facial reduction for convex optimization. Justifying and understanding frozen projections—when, why, and how partial parameter freezing or off-the-shelf mapping enforces structure, regularization, or theoretical guarantees—is a central research concern across these settings.

1. Definition and General Principles

Frozen projection, in its most general form, designates any strategy where an initial transform, subspace, or module is precomputed or pretrained and subsequently held fixed as a constraint or embedding for subsequent adaptation. This can take the form of:

A pretrained encoder whose output is mapped via a (possibly trainable) projection, with the encoder weights "frozen"—preventing catastrophic drift or overfitting;
A random (or deterministic, structured) linear projection chosen once and fixed for all downstream computations—enabling condensed computation and statistical guarantees;
A projected gradient or update which is always orthogonal (or otherwise restricted) to a fixed, previously acquired parameter subspace—ensuring no forgetting in lifelong or continual learning;
A geometric operation (e.g., hypersphere projection, facial reduction) where the projection step is performed onto a fixed manifold or face, anchoring further computation to a known feasible set.

The frozen component—in whatever mathematical or architectural form—acts as a regularizer, anchor, or geometric guardrail that limits and structures subsequent learning or inference. The justification for this approach may be statistical (preservation of properties under projection), optimization-theoretic (faster convergence, invertibility), or geometric (maintenance of a metric structure).

2. Frozen Projections in Representation and Adapter Learning

A canonical application is in adapting frozen encoders, such as CLIP, for robust vector search under distribution shift. The Euclidean Geodesic Alignment (EGA) approach introduces a high-capacity residual MLP adapter, initialized at zero and coupled with explicit L2 hypersphere projection. This construction guarantees that, throughout training, all updates remain anchored to the original frozen feature geometry. The residual’s output is added to the frozen feature and renormalized to the unit sphere:

$f_\theta(z) = \frac{z + g_\theta(z)}{\|z + g_\theta(z)\|_2}$

where $z$ is the frozen encoder output and $g_\theta$ is the learned residual (Zhao, 7 May 2026).

The local triplet hinge loss is employed such that only margin-violating neighborhoods receive gradient, provably shrinking the region of adaptation over time. The vast majority of triplets become “inactive,” ensuring that most of the feature space (notably, the unseen-class regions) is left untouched—a form of implicit “parameter freezing” over the embedding manifold. The combination of zero initialization, hypersphere projection, and triplet sparsity yields a path-integral–style bound on the L2 drift of any unseen-class representation, ensuring that out-of-distribution features cannot be perturbed beyond a provably small constant (proportional to the active-triplet mass integrated over training time). Removing the hypersphere projection (ablation) erodes this guarantee, resulting in global distortion and degraded in/out-distribution label precision (Zhao, 7 May 2026).

3. Frozen Projections in Model Composition and Graph Architectures

Frozen projection forms the backbone of modular architectures where large pretrained models (LLMs, vision encoders) serve as fixed nodes communicating through learned, low-parameter projections. In feedforward model graphs, nodes encode input into their latent space, and learned projections perform coordinate alignment and aggregation without any gradient reaching into frozen weights (Armstrong et al., 9 Apr 2026).

The theoretical foundation rests on the empirical and mathematical finding that independently trained LLMs inhabit geometrically compatible latent spaces—embeddings differ primarily by an orthogonal transformation and scaling. This permits the use of learned linear projections alone to translate between these spaces with no need for weight adaptation within the LLMs themselves. Through end-to-end training, only the projections and small attention/output modules are updated, with “dead weights” (the frozen models) and “live signals” (the projections and communication) (Armstrong et al., 9 Apr 2026).

Empirical justification is provided by capturing the bulk of the performance improvement—relative to single models—using only 0.15% additional parameters. Backpropagation flow through the frozen projection boundaries is maintained (traced empirically to ∼13% of the head gradient's norm), and emergent behavior is observed in output routing weights, all without modification of the constituent frozen models (Armstrong et al., 9 Apr 2026).

4. Analytical Guarantees in Statistical and Optimization Settings

In statistical inference—especially scalable influence scoring in deep models—frozen random projections are used to reduce dimension, with the sketch matrix $P$ chosen once (“frozen”) and applied to all future queries. Recent theory provides precise conditions under which such fixed projections preserve key functional forms under inversion, which is not covered by classical Johnson–Lindenstrauss guarantees (Hu et al., 11 Feb 2026).

For influence functions $\tau_\lambda(g, g') = g^\top (F + \lambda I)^{-1} g'$ , where $F$ is curvature, exact preservation under frozen projection requires $P$ to be injective on $\mathrm{range}(F)$ in the unregularized case and that the number of rows $m$ exceeds the effective dimension $d_\lambda(F)$ in the regularized case:

$z$ 0

where $z$ 1 (Hu et al., 11 Feb 2026). Exact preservation, up to a leakage correction for out-of-range queries, is justified via operator norm concentration and perturbation bounds. This provides a rigorous foundation for using a single, preselected projection in large-scale, distributed systems without repeated resampling.

In convex optimization, particularly facial reduction in degenerate semidefinite programming, the frozen projection is the operation of projecting iterates onto a fixed face of the cone, determined once an exposing vector of degeneracy is identified. This reformulates the problem so that strict feasibility and non-singularity are restored, and iterative optimization (e.g., via semi-smooth Newton methods) becomes both stable and quadratically convergent. Further degeneracies are handled by recursively freezing additional faces (Im et al., 2024).

5. Frozen Projections in Continual and Lifelong Learning

The “frozen subspace” idea appears in continual learning via projected gradient updates. The Restricted Orthogonal Gradient prOjection (ROGO) framework adapts this by freezing the projection of the parameter update onto a critical subspace $z$ 2, constructed from prior task directions. ROGO relaxes the hard-orthogonal constraint by allowing updates within a relaxable subspace $z$ 3, but still requires orthogonality to the complement. The projector onto the non-relaxed directions, $z$ 4, is fixed per task-transfer and updates occur solely in the orthogonal complement, guaranteeing “no forgetting” with controlled forward transfer (Yang et al., 2023).

Empirically, ROGO increases final accuracy by 1–3% over fully hard-orthogonal schemes on benchmarks like CIFAR-100 and Mini-ImageNet. The frozen-projection subspace restricts loss-increase on previous tasks while maximizing learnability (forward transfer) on the new task, validated both by geometric analysis and practical performance (Yang et al., 2023).

6. Theoretical and Empirical Benefits in Architectural Variants

Freezing part or all of a projector is also empirically beneficial in feature learning pipelines. For example, in SimCLR-based contrastive learning, replacing the first layer of a standard trainable 2-layer MLP projection head with a pretrained, frozen autoencoder embedding provides significant accuracy boosts, reduces required width and output dimension, and stabilizes convergence (Schliebitz et al., 2024). The justification grounds in the hypothesis that frozen, task-agnostic embeddings inject stable, useful structure, prevent overfitting induced by co-adaptation to the contrastive loss, and reduce variance by narrowing the parameter search space. Empirical results show up to 2.9% peak accuracy improvement and consistently lower variance compared to non-frozen or fine-tuned alternatives (Schliebitz et al., 2024).

7. Frozen Projection in Point Processes and Space-Filling Designs

In the design of space-filling samples via determinantal point processes (DPPs), freezing coordinates (projection onto subspaces) retains the repulsive structure of the original point set under precise separability conditions on the kernel. The necessary and sufficient condition is that the kernel $z$ 5 factors across directions; then, for any projection $z$ 6, the projected process $z$ 7 remains repulsive with $z$ 8. This underpins rigorous justification for “frozen projection” designs: one can sample once in high dimensions then drop (freeze) any set of coordinates to obtain valid, regular samples in lower dimensions for Monte Carlo integration or computer experiments (Mazoyer et al., 2019).

Summary Table: Justification Mechanisms Across Domains

Domain	Freezing Context	Justification Mechanism
Adapter/Rep. Learning (Zhao, 7 May 2026)	Encoder + residual proj., hypersphere	Path-integral bound, triplet sparsity, OOD drift bound
Model Composition (Armstrong et al., 9 Apr 2026)	Model weights, linear proj.	Latent geometric alignment, empirical performance, modularity
Statistical Sketching (Hu et al., 11 Feb 2026)	Random sketch matrix	Operator concentration, effective dimension theorems
Continual Learning (Yang et al., 2023)	Parameter subspace in updates	Orthogonal gradient proj., no-forgetting guarantee
SimCLR Projector (Schliebitz et al., 2024)	Pretrained embedding projector	Empirical accuracy, stability analysis, variance reduction
Point Processes (Mazoyer et al., 2019)	Kernel tensor-product structure	Pair-correlation preservation, Ripley’s function analysis
SDP/Facial Reduction (Im et al., 2024)	Min. face in semidefinite cone	Jacobian invertibility, strict feasibility via face reduction

Rigorous theoretical foundations, often combined with empirical validation, underpin frozen projection as a unifying concept governing stability, modularity, geometric containment, and scalable computation across learning systems and convex optimization.