Frozen Encoder with Linear Probe

Updated 3 April 2026

Frozen pretrained encoder plus linear probe is a representation learning paradigm that fixes a powerful feature extractor while training a simple linear layer for specific tasks.
The approach decouples feature reuse from adaptation dynamics, minimizes overfitting, and excels in low-shot or low-resource scenarios.
Empirical results across vision, language, and audio validate its robustness, interpretability, and efficient transfer of semantic information.

A frozen pretrained encoder plus linear probe is a paradigm in representation learning and transfer learning where the parameters of a powerful, domain-general feature extractor are held fixed, and a lightweight linear layer (the probe) is trained atop its outputs for downstream tasks. This approach isolates the representational content of the encoder, sharply separates feature reuse from adaptation dynamics, and underpins a wide spectrum of empirical and theoretical studies spanning vision, language, audio, and multimodal modeling.

1. Formal Setup and Core Methodology

Let $f_{\theta}: \mathbb{R}^p \rightarrow \mathbb{R}^d$ be a pretrained encoder with parameters $\theta$ obtained via large-scale supervised or self-supervised pretraining. In the frozen+probe regime, $\theta$ is held fixed and only new parameters $W, b$ in a shallow head (often just a single linear transformation) are optimized: $y = W f_{\theta}(x) + b$ for classification, regression, or structural prediction targets $y$ . The simplicity of the probe—usually a fully connected layer, optionally followed by softmax or sigmoid—enables attribution of performance directly to the information encoded in $f_{\theta}$ , decoupled from adaptation effects.

Crucially, the probe’s parameters are typically orders of magnitude fewer than the encoder’s. This restriction makes the paradigm especially attractive and robust in low-shot and low-resource scenarios, minimizes overfitting risk, and enables efficient label utilization (Lee et al., 2022, Luo et al., 2023, Li et al., 7 Aug 2025, Hummel et al., 13 Jan 2026).

2. Theoretical Justification and Geometric Foundations

The efficacy of frozen encoder + linear probe can be traced to architectural constraints of contemporary deep models, especially transformers. Linear communication interfaces—unembedding matrices, attention OV circuits—enforce that any feature that is decodable must occupy a context-invariant linear subspace in the model’s hidden state space. The Invariant Subspace Necessity theorem formalizes this: for any semantic feature $f$ linearly decodable by some $W$ , all variance relevant to $f$ must align along a fixed subspace $\theta$ 0, independent of context or input (Saurez et al., 10 Feb 2026).

This architectural property implies:

Linear probes can consistently recover semantically meaningful directions. For binary features, these correspond to a 1-dimensional space; for $\theta$ 1-way categories, a (K-1)-dimensional subspace suffices.
Tokens serve as reference vectors: the “Self-Reference Property” guarantees that the embedding of a canonical class token lies in the corresponding class subspace, enabling zero-shot or unsupervised detection of class semantics by direct projection (Saurez et al., 10 Feb 2026).

This necessity is corroborated across families (LLaMA, Mistral, GPT2), domains, and tasks: empirical results show that class-token alignment, zero-shot classification, and sparse autoencoder directions all recover the same geometry (Saurez et al., 10 Feb 2026).

3. Downstream Protocols and Practical Variants

The frozen encoder + linear probe paradigm has catalyzed diverse methodologies:

Structural Probing in Language: Probing the syntactic structure in pretrained LMs involves fitting a linear transformation to minimize deviation between a squared $\theta$ 2-distance in projected space $\theta$ 3 and gold tree distances across tokens (Pal et al., 2024). Freezing all BERT (or BERT_LARGE) layers and fitting only the probe (via Adam, $\theta$ 4 columns) yields strong recovery of dependency trees (e.g., BERT linear probe UUAS = 66.65, RBF probe UUAS = 69.71 on UD-EWT English).
Active Learning: Training a probe (e.g., class-aware softmax linear classifier) on frozen representations can be tightly coupled with uncertainty-based label acquisition (smallest-margin or entropy) for maximal data efficiency in speech and vision. The ALOE system demonstrates near-oracle accuracy on speech tasks with only a few hundred labeled samples (Lee et al., 2022).
Few-Shot and Continual Learning: In extremely low-shot settings, most feature dimensions in frozen representations are empirically redundant (contributing noise). Downstream accuracy is maximized by re-weighting or masking to retain only high-importance features, as quantified by inter-class separability and intra-class variance. Soft masking by estimated importance (e.g., $\theta$ 5) systematically improves $\theta$ 6-shot accuracy over vanilla probes (Luo et al., 2023).
Multimodal Probes: Augmenting frozen LLMs with a self-supervised visual encoder and a single linear map allows direct image-to-text translation; empirically, semantic “translation” occurs inside transformer MLP layers, where individual neurons—identified via attribution and decoding—are causally responsible for injecting specific visual concepts into the residual stream (Schwettmann et al., 2023).

Table: Hyperparameter, Architecture, and Variant Overview

Domain	Encoder (frozen)	Probe Form	Regime/Acquisition
NLP structure	BERT/BERT_LARGE	Linear, $\theta$ 7	Adam, up to 200 epochs, UUAS
Speech CL/AL	Audio Spectrogram Transformer	Linear softmax	Adam, AL loop, smallest margin
Vision few-shot	ResNet, ViT, CLIP	Linear, soft-masked	Logistic regression, N-shot
Multimodal	BEIT+GPT-J	Linear proj	CE LM loss, patch prefix

4. Empirical Results and Analytical Findings

Across research domains, freezing pretrained encoders and attaching linear probes consistently yields strong performance:

NLP Syntax and Semantics: Frozen BERT, with only a linear probe, achieves UUAS = 66.65 in dependency parsing (Pal et al., 2024), and bracketing F1 = 93.5% on PTB constituent parsing, exceeding all prior sequence tagging parsers (Vilares et al., 2020).
Minimal Label, Maximal Efficiency: On non-semantic speech tasks, ALOE matches full-data accuracy (e.g., 99.2% on Voxforge vs. 99.6% SOTA) using only ∼600 labels (Lee et al., 2022). In MRI, a frozen MAE encoder plus 6K-parameter linear probe achieves 99.2% sequence identification, outperforming all baselines (Li et al., 7 Aug 2025).
Unveiling Redundancy: Feature masking experiments show that, in 5-way 1-shot transfer, retaining just 1% of frozen dimensions fully recovers standard probe performance; empirical accuracy jumps sharply when masking low-importance, high-variance dims (Luo et al., 2023).
Multimodal Transfer and Causality: A single linear alignment bridge suffices for vision-to-language transfer, with internally identifiable “multimodal neurons” operating as concept bottlenecks (Schwettmann et al., 2023).
Geometry and Regression: Extraction of continuous values (e.g., hand pose, gaze) is tractable; a 6K-parameter linear ridge probe achieves MAE = 6.1° on hand pose versus 20.0° for text-only output in foundation models (Shkolnikov, 6 Mar 2026).
Failure Modes: In sequential continual learning, linear probes on frozen 3D MRI encoders retain segmentation performance with zero forgetting (BWT ≈ −0.01) but fail to linearly separate features for regression under domain shift (Chen et al., 26 Feb 2026).
Pooling and Bottlenecks: In audio, global pooling of frozen representations collapses discriminative local information; alternative probes (e.g., binarized prototypical) that aggregate all patch tokens recover >14 percentage points in multi-label mAP over standard linear probes (Rauch et al., 29 Sep 2025).

5. Limitations, Open Questions, and Extensions

Several limitations and strategies have emerged:

Non-linear Alternatives: Non-linear probes (e.g., radial basis function) can outperform linear ones in certain syntactic tasks, exploiting the richer geometry of transformer representations (RBF probe BERT UUAS = 69.71 vs linear 66.65) (Pal et al., 2024), though in many few-shot and classification regimes linearity suffices and is preferable for interpretability and parameter efficiency (Luo et al., 2023, Saurez et al., 10 Feb 2026).
Sample Complexity and Feature Redundancy: When $\theta$ 8, overfitting to confounding, high-variance, or low-separability dimensions is exacerbated; soft-masking or dimension selection is required for best transfer (Luo et al., 2023). Redundancy diminishes as more labels accrue.
Pooling Choices: Proper aggregation (e.g., max/mean pooling vs zero-padding) is critical for effective probe performance, notably in sequence or patch-based encodings (Mamtani et al., 28 Nov 2025, Rauch et al., 29 Sep 2025).
Domain Shift: Domain adaptation with frozen encoders is sensitive to input distribution gaps (e.g., medical imaging). Combinations such as MoVL (visual prompting at input + linear probe at output) provide a route to closing gaps and even exceeding full fine-tuning under OOD shift (Tian et al., 2024).
Task Boundaries: In sequential learning, capacity is limited by the ability of the frozen encoder to represent all tasks as linearly separable; otherwise adapter-based or low-rank fine-tuning is required (Chen et al., 26 Feb 2026).

6. Interpretability and Scientific Implications

The linear probe regime yields direct interpretability of what a pretrained model “knows” and exposes the representational geometry underpinning high performance:

Information Localization: Contextualized encoders provide compressed, disentangled features; successful linear probing reveals the extent and locus (by layer, head, or neuron) of task-specific information (Xu et al., 2020, Schwettmann et al., 2023, Shkolnikov, 6 Mar 2026).
Causal Attribution: Gradient-based attribution, ablation, and decoding can pinpoint which individual units in the backbone carry, inject, or block specific features—enabling scientific study of neural circuits underlying emergent abilities (Schwettmann et al., 2023).
Architectural Imperatives: The interplay of frozen weights, subspace alignment, and probe simplicity is not accidental, but a design-induced consequence of transformer architecture (Saurez et al., 10 Feb 2026).
Unified Methodological Basis: Probing, retrieval, and activation steering are reconcilable within a common framework of invariant subspace discovery and exploitation.

7. Best Practices and Guidelines

When deploying or analyzing the frozen encoder + linear probe methodology, the following practices are substantiated across domains:

Layer and Dimensionality Selection: Extract from the highest-performing layer, optionally sweep probe rank $\theta$ 9 ∈ {1, …, 256}; most tasks saturate at $\theta$ 0 (Pal et al., 2024).
Data Efficiency: For $\theta$ 1, reweight or mask input features by estimated importance; use data augmentation for more reliable variance estimation (Luo et al., 2023).
Optimization: Prefer Adam/AdamW with early stopping, no weight decay unless specified (Pal et al., 2024, Lee et al., 2022).
Prompting for Domain Gap: In vision, prepend learnable edge/border perturbations (visual prompting) before linear classification for maximal coverage under shift (Tian et al., 2024).
Interpretation: High probe accuracy indicates true invariant substructure in the pretrained features, not spurious probe expressivity (Saurez et al., 10 Feb 2026).

In sum, the frozen pretrained encoder plus linear probe framework provides a powerful, interpretable, and efficient route for harnessing and analyzing the information embedded in large-scale learned representations. Its success stems from architectural properties that enforce linear decodability of semantic features, with effectiveness empirically validated across tasks, modalities, and domains (Saurez et al., 10 Feb 2026, Pal et al., 2024, Lee et al., 2022, Luo et al., 2023, Shkolnikov, 6 Mar 2026).