Papers
Topics
Authors
Recent
2000 character limit reached

PRESTO: Adaptive Early Stopping & Attention Regularization

Updated 12 December 2025
  • The paper introduces PRESTO for linear predictors, an adaptive early stopping rule that reduces average feature usage from n to O(√(n log(1/√δ))) by halting computations when partial margins fall below a threshold, achieving speedups up to 16× with minimal accuracy loss.
  • The paper presents PRESTO for LLM decoding as an attention regularizer that penalizes adversarial prefill tokens, substantially reducing RAP attack scores by 3–4.8× while preserving core utility metrics.
  • The study provides theoretical guarantees using Brownian bridge theory and Wald’s identity for linear models, along with practical integration guidelines for both SVM/AdaBoost systems and large-scale LLMs.

PRefill attEntion STOpping (PRESTO) denotes two distinct attention-based mechanisms in machine learning: (1) an early stopping rule for linear predictors that reduces computational cost by halting feature evaluation on “easy” instances, introduced in the context of SVMs and AdaBoost by Wang et al. (Pelossof et al., 2012); and (2) an attention regularizer for LLMs, penalizing model attention to adversarial prefilling tokens to thwart safety alignment circumvention, as proposed by Lu, Qi, and colleagues (Vega et al., 5 Dec 2025). Both mechanisms leverage the core insight that adaptively regulating computation or model focus—based on input trace signals—enables improved trade-offs either in efficiency or in robustness.

1. PRESTO for Linear Predictors: Adaptive Early Stopping

The original PRESTO mechanism (Pelossof et al., 2012) provides an attention-inspired probabilistic stopping rule layered atop any linear predictor, such as SVM or AdaBoost, to efficiently evaluate only a necessary subset of features per instance. PRESTO operates via two phases:

  • Prefill Phase: Always compute the first pp feature–weight products {w1x1,,wpxp}\{w_1 x_1, \ldots, w_p x_p\}, where pp is selected to balance bias/variance.
  • Early-Stop Phase: After ipi \geq p steps and partial margin Si=j=1iwjxjS_i = \sum_{j=1}^i w_j x_j, halt further evaluation and assign “negative” if Si<τS_i < \tau (for some threshold τ<0\tau < 0 determined by a user-supplied stop-error δ\delta). Otherwise, proceed to full evaluation.

Integration with SVM or AdaBoost simply requires running the standard feature computation loop with the PRESTO criterion inserted. Selection of τ\tau regulates the error budget due to stopping.

2. PRESTO for Safe LLM Decoding: Attention Regularization during Alignment

In contrast, PRESTO in modern LLM alignment (Vega et al., 5 Dec 2025) is an attention-penalizing regularizer designed to prevent adversarial “prefilling” attacks—where a harmful prompt is issued with a preceding misleading response prefix, prompting the model to output undesired content. Traditional “deep” supervised fine-tuning (SFT) with data augmentation fails to eliminate low-probability harmful tokens from the top kk vocabulary ranks, thus remaining susceptible to Rank-Assisted Prefilling (RAP) attacks in which adversaries explicitly select harmful top-kk tokens.

PRESTO intervenes by penalizing total attention mass that decoder queries allocate to prefill-token indices, layer-wise and head-wise, during SFT on adversarially augmented refusal datasets. The trained model is thereby encouraged to “ignore” the prefill and base its outputs on the user/system context alone.

Given token indices 1,,n1,\ldots,n, with I[n]I \subset [n] as prefill positions, attention matrices aij(l,h)a_{i \to j}^{(l,h)}, and LLM parameters θ\theta, the PRESTO loss is:

PRESTO(θ)=E(x,pre)Dl=1Lh=1Hi=1n[jIaij(l,h)(θ)jIaij(l,h)(θ)]\mathrm{PRESTO}(\theta) = \mathbb{E}_{(x, \mathrm{pre}) \sim D} \sum_{l=1}^L \sum_{h=1}^H \sum_{i=1}^n \left[ \sum_{j \in I} a_{i \to j}^{(l,h)}(\theta) - \sum_{j \notin I} a_{i \to j}^{(l,h)}(\theta) \right]

Minimizing this loss redistributes attention away from adversarial prefill tokens.

3. Theoretical Properties and Guarantees

Linear Predictor PRESTO

Under independence and boundedness of feature contributions, PRESTO provably reduces average feature usage from nn (full evaluation) to

E[#features]=O(nlog(1/δ))\mathbb{E}[\#\text{features}] = O\left( \sqrt{n \log(1/\sqrt{\delta})} \right)

for error budget δ\delta, with an explicit boundary threshold:

τ=Var(Sn)ln(1/δ)\tau = \sqrt{\operatorname{Var}(S_n)}\sqrt{\ln(1/\sqrt{\delta})}

Key lemmas establish that the error probability is controlled via Brownian bridge theory, and mean stopping time is O(nln(1/δ))O(\sqrt{n \ln(1/\sqrt{\delta})}) by Wald’s identity.

LLM PRESTO

PRESTO does not introduce new theoretical error bounds on safety, but mechanistically ensures the model’s forward attention is unbiased by adversarial prefilling. This results in significant empirical gains in actual RAP resistance, with minimal loss in standard utility metrics.

4. Empirical Performance

Linear Predictors

Benchmarks on MNIST, Real-sim, Gisette, and synthetic datasets demonstrate that, for δ10%\delta \approx 10\%, PRESTO achieves the following efficiency and accuracy:

Dataset nn Full Eval Accuracy Avg Features (PRESTO) Speedup Accuracy Loss
Gisette (lin SVM) 5,000 97.2% ~2,500 (50%) <0.2%
MNIST 2 vs 5 (RBF SVM) 784 98.1% ~49 (6%) ~16× <0.1%
MNIST 3 vs 10 (RBF SVM) 784 97.5% ~72 (9%) ~11× <0.2%
Real-sim (lin SVM) 20,958 95.3% ~7,000 (33%) <0.3%
Synthetic (online Boost) 100 94.8% update on 1% of ex. 100×

The precision–recall curves for PRESTO closely match those of full evaluation, unlike fixed-budget baselines.

LLM Alignment

On Llama 2 7B Chat, Qwen 3 8B, and Gemma 3 12B-IT, PRESTO reduces mean StrongREJECT scores under RAP attacks by 3–4.8× (e.g., from 0.539 to 0.113 automate RAP mean score), with utility on MT-Bench/GSM-8K dropping at most by 0.2 points or 1–2%, within experimental noise (Vega et al., 5 Dec 2025).

5. Integration and Practical Considerations

Linear Predictors

  • Integration: Only test-time logic is changed; the training procedure is unchanged. PRESTO is transparent to the base predictor.
  • Parameter Estimation: Setting pp (prefill length) and estimating Var(Sn)\operatorname{Var}(S_n) may require heldout validation or cross-validation. Sorting features by wi|w_i| can further decrease stopping time in practice. Non-parametric variants (Hoeffding-style stopping) are possible.
  • Limitations: Independence assumptions are crucial for theoretical guarantees; strong feature correlation may deteriorate performance.

LLMs

  • Integration: The PRESTO attention regularizer can be incorporated via a standard SFT loop, adding the computed PRESTO penalty to the SFT cross-entropy loss term; the default weight λ=1.0\lambda=1.0 is effective out-of-the-box.
  • Overhead: Training time and memory increase marginally (<10% with cached attention weights).
  • Limitations: PRESTO targets only prefilling-based, top-k rank-based attacks; it is not a comprehensive defense against all jailbreak styles or multi-turn exploits.
  • Extensions: Layer/head weighting (Vega et al., 5 Dec 2025), explicit rank correlation penalties, and forward n-gram path regularization are outlined as promising future directions.

6. Comparison and Distinctive Features

  • Adaptivity: Both instantiations of PRESTO offer adaptive resource allocation—the linear predictor version allocates compute to ambiguous instances; the LLM version selectively demotes attention to adversarial prefixes.
  • Guarantees: The linear predictor PRESTO provides explicit probabilistic error and efficiency guarantees absent in static budget approaches. In LLMs, PRESTO yields significant empirical resistance gains against RAP attacks without sacrificing general model utility.
  • Implementation Simplicity: Both variants are non-invasive, wrapping basic prediction or training loops and requiring no retraining or retraining only with an augmented loss; this facilitates practical deployment.
  • Current Drawbacks: Linear predictor PRESTO requires independence (or near-independence) of features to achieve theoretical speedups, and LLM PRESTO is limited in attack generality and is specific to attention mechanisms.

7. Broader Implications and Future Directions

PRESTO represents a class of attention- or information-allocation mechanisms that optimize performance by dynamically focusing computational or representational resources where most needed. In linear models, this addresses efficiency–accuracy trade-offs; in modern LLMs, it addresses safety–robustness against prompt manipulations. A plausible implication is that similar adaptive or attention-based stopping and regularization principles may generalize to other architectures or modalities, especially as adversarial vulnerabilities and efficiency constraints continue to shape applied machine learning research (Pelossof et al., 2012, Vega et al., 5 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to PRefill attEntion STOpping (PRESTO).