Context Sample Enhancement (CSE)

Updated 3 July 2026

Context Sample Enhancement (CSE) is a family of approaches that leverages structured context to improve performance in LLMs, reinforcement learning, and speech enhancement.
It employs domain-specific methods, such as iterative refinement in MARINE and cross-attention in speech models, to yield robust gradient signals and enhance sample efficiency.
Empirical evaluations demonstrate that CSE reduces sample complexity and improves accuracy while preserving data privacy during training.

Context Sample Enhancement (CSE) describes a family of approaches that augment, leverage, or refine context samples to improve training or inference across modalities such as LLMs, reinforcement learning (RL), and speech enhancement. While the implementations are domain-specific, the unifying theme is the exploitation of additional, structured context or sample perturbations—often without direct gradient or loss on the augmented data itself—to improve sample efficiency, generalization, or test-time accuracy.

1. Formal Definitions and Distinctions

In the context of LLMs, CSE often refers to a regime in which training or inference is augmented by contextually relevant data present in the input but not directly used for autoregressive loss computation. Consider a LLM $f_\theta : X \to Y$ trained on corpus $\mathcal{D} = \{(x_i, y_i)\}$ , with standard supervised fine-tuning minimizing the cross-entropy $\ell_{\text{auto}}(f_\theta(x), y)$ . In CSE, inputs are prepended with a curriculum or context $\operatorname{maskcurr}(x, t)$ —such as phrasebooks or worked examples—yielding the input $[\operatorname{maskcurr}(x, t), x, y]$ but with the loss computed only on $x \to y$ tokens. The context tokens are "frozen": their presence enriches the hidden state, but gradients are not taken with respect to them (Zhu et al., 3 Mar 2025).

CSE lies between pure supervised fine-tuning (SFT, no context) and in-context learning (ICL, only context at inference). In RL, the analogous mechanism is explicit generation of augmented samples using local context linearizations, while in speech enhancement, context (e.g., prior noise-only segments) is embedded and merged via advanced attention mechanisms (Narayanan et al., 2021, Chapman et al., 10 Jul 2025).

2. Methodologies Across Domains

In LLMs

CSE may be realized via explicit context augmentation at training or test time, such as in MARINE, a multi-agent recursive in-context enhancement framework (Zhang et al., 5 Dec 2025). Test-time CSE is implemented by iterative refinement:

A reference reasoning trajectory $\tau^{(k)}$ and summary $C^{(k)}$ are maintained.
At round $k+1$ , $M_{k+1}$ agents sample new trajectories conditioned on $\mathcal{D} = \{(x_i, y_i)\}$ 0.
A deterministic refinement operator $\mathcal{D} = \{(x_i, y_i)\}$ 1 merges superior fragments from candidates $\mathcal{D} = \{(x_i, y_i)\}$ 2 into a new reference, leveraging structured representations and patching (e.g., segment-wise corrections).
This process iteratively reduces the trajectory's distance to the unknown ideal $\mathcal{D} = \{(x_i, y_i)\}$ 3, turning high pass@ $\mathcal{D} = \{(x_i, y_i)\}$ 4 latent capacity into effective pass@ $\mathcal{D} = \{(x_i, y_i)\}$ 5 performance at point of inference.

In Contextual Reinforcement Learning

In deterministic contextual Markov decision processes (CMDPs), CSE appears as targeted data augmentation: each transition $\mathcal{D} = \{(x_i, y_i)\}$ 6 under context $\mathcal{D} = \{(x_i, y_i)\}$ 7 is mapped to plausible transitions under $\mathcal{D} = \{(x_i, y_i)\}$ 8 using first-order Taylor expansion informed by $\mathcal{D} = \{(x_i, y_i)\}$ 9 and $\ell_{\text{auto}}(f_\theta(x), y)$ 0. This process approximates the context-enhanced Bellman equation (CEBE), creating augmented replay batches that support generalization to test contexts divergent from training (Chapman et al., 10 Jul 2025).

In Speech Enhancement

CSE is instantiated in the Cross-Attention Conformer, where noisy speech and a preceding noise-only segment are encoded separately. Cross-attention modules fuse speech and noise contexts at multiple layers, allowing the enhancement frontend to produce robust spectral masks for ASR systems under variable noise conditions (Narayanan et al., 2021).

3. Theoretical Properties and Sample Efficiency

Theoretical analysis highlights the sample efficiency advantages of CSE. In multi-layer translation (MLT) synthetic tasks for LLMs, standard SFT exhibits exponential sample complexity in task depth $\ell_{\text{auto}}(f_\theta(x), y)$ 1; by contrast, CSE—via context-provided phasebooks and staged dropout—achieves polynomial sample complexity. This separation is formalized as:

SFT: requires $\ell_{\text{auto}}(f_\theta(x), y)$ 2 samples (SQ-dimension lower bound).
CSE: succeeds in $\ell_{\text{auto}}(f_\theta(x), y)$ 3 samples (Zhu et al., 3 Mar 2025).

In RL, the CEBE is first-order accurate: the Q-function learned under CSE is $\ell_{\text{auto}}(f_\theta(x), y)$ 4 close to the true Q-function for contexts near $\ell_{\text{auto}}(f_\theta(x), y)$ 5. This ensures that CSE-trained policies generalize nearly optimally within the neighborhood of trained contexts (Chapman et al., 10 Jul 2025).

For MARINE, batch size optimization is theoretically grounded:

Under fixed query budgets, minimal feasible batches ( $\ell_{\text{auto}}(f_\theta(x), y)$ 6) maximize gain per invocation for $\ell_{\text{auto}}(f_\theta(x), y)$ 7, as $\ell_{\text{auto}}(f_\theta(x), y)$ 8 is strictly decreasing for $\ell_{\text{auto}}(f_\theta(x), y)$ 9.
In unlimited-budget regimes, a logarithmic batch growth ( $\operatorname{maskcurr}(x, t)$ 0) ensures monotone improvement with high probability (Zhang et al., 5 Dec 2025).

4. Mechanistic Insights and Implementation

Mechanistically, CSE's efficacy is attributed to the production of cleaner gradient signals. In LLMs, when context provides near-complete coverage of task-relevant rules, the induced gradients point directly towards the parameters encoding the missing or dropped rules. The accuracy of these gradients, as measured by the probability that their maxima select the correct column, degrades with less context, emphasizing the importance of properly structured context curricula (Zhu et al., 3 Mar 2025).

In MARINE, trajectory refinement integrates locally optimal segments, and batch diversity is supported by prompt, temperature, and tool-call heterogeneity. The iterative process ensures that each candidate's deviation from the reference is inspected segment-wise, with globally best local repairs patched to the reference trajectory.

In RL, CSE is realized in practice by:

Evaluating context gradients $\operatorname{maskcurr}(x, t)$ 1, $\operatorname{maskcurr}(x, t)$ 2 at $\operatorname{maskcurr}(x, t)$ 3.
Synthetic generation of perturbed samples for replay buffer augmentation.
Training under standard off-policy pipelines (e.g., SAC, DQN) remains unchanged aside from the inclusion of CSE-augmented data (Chapman et al., 10 Jul 2025).

In speech enhancement, the cross-attention conformer explicitly aligns speech and noise feature sequences, utilizing contextual embeddings at multiple stack depths to compute ideal ratio masks, with FiLM merges improving information fusion (Narayanan et al., 2021).

5. Empirical Results and Performance Benchmarks

Comprehensive empirical evaluation confirms the practical benefits of CSE:

In LLMs, annealed-dropout CSE attains 95–100% accuracy with $\operatorname{maskcurr}(x, t)$ 4 samples on MLT tasks, compared to $\operatorname{maskcurr}(x, t)$ 5 for vanilla SFT.
Memorization testing reveals that models trained with CSE do not leak context-specific rules, even under strong probing (Zhu et al., 3 Mar 2025).
MARINE outperforms both Best-of- $\operatorname{maskcurr}(x, t)$ 6 and Self-Refine on BrowserComp-ZH: 685B-param MARINE achieves 46.0% pass@ $\operatorname{maskcurr}(x, t)$ 7 under a $\operatorname{maskcurr}(x, t)$ 8 budget, compared to 35.3% (BoN) and 40.5% (Self-Refine). Even at 80B params, MARINE matches or surpasses a standalone 1000B model, yielding unprecedented parameter efficiency (Zhang et al., 5 Dec 2025).
In RL, CSE closes nearly the entire domain randomization gap, matches local domain randomization (LDR), and dramatically exceeds vanilla SAC/DQN across tabular, simulated, and continuous-control domains (Chapman et al., 10 Jul 2025).
In speech enhancement, two-stage cross-attention conformers (E3) provide consistent Word Error Rate (WER) reductions across simulated, vendor, and multi-talker test conditions versus non-contextual models and shallow context fusion baselines. The largest improvements appear for longer (≈6 s) noise contexts and deeper attention stacks (Narayanan et al., 2021).

6. Security, Privacy, and Practical Implications

A salient property of CSE in LLMs is that models can internalize information from context—such as private curricula or copyrighted documents—without retaining extractable copies. Empirical probing for memorization yields near-random recovery rates, suggesting that supplying privileged documents as context during training can enable rapid skill acquisition with minimal leakage risk. Conversely, this mechanism could obscure the provenance of sensitive information within trained models, raising copyright and data security questions (Zhu et al., 3 Mar 2025).

For practitioners, MARINE's CSE methodology recommends minimal feasible batch sizes and shallow recursion for budget-constrained deployment, and progressive batch growth where computational cost is dominated by context refinement or downstream sample utility (Zhang et al., 5 Dec 2025). In RL, CSE can be integrated into standard off-policy pipelines with minimal overhead, requiring only evaluation of context gradients and first-order augmentation per sample (Chapman et al., 10 Jul 2025).

These findings position Context Sample Enhancement as a theoretically grounded and empirically validated paradigm for amplifying the value of context in learning and inference across multiple modalities, offering tangible gains in sample efficiency, generalization, parameter efficiency, and privacy-preserving learning.