Papers
Topics
Authors
Recent
2000 character limit reached

Adaptive GoGI-Skip: Dynamic Compression Framework

Updated 18 December 2025
  • Adaptive GoGI-Skip is a neural framework that applies goal-gradient metrics to identify critical tokens, enabling dynamic compression of reasoning traces.
  • It combines entropy-driven retention and adaptive skipping to maintain output coherence while significantly reducing token redundancy.
  • Empirical results reveal up to 2× speedup with minimal accuracy loss, proving its effectiveness in chain-of-thought and vision model adaptations.

Adaptive GoGI-Skip refers to a family of neural computation frameworks that combine goal-oriented, gradient-based token (or submodule) importance metrics with dynamic, uncertainty-aware skipping mechanisms to achieve efficient, adaptive sequence compression and/or computation allocation, particularly in LLMs engaged in complex reasoning tasks. The methodology is epitomized in Chain-of-Thought (CoT) reasoning compression but is extensible to depth adaptation in vision models and dynamic skip-intervals in temporal models. This article provides a comprehensive treatment based on the state-of-the-art instantiation in "Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping" (Zhuang et al., 13 May 2025) as well as related advances in adaptive computation.

1. Conceptual Foundations

Adaptive GoGI-Skip realizes selective compression and adaptive inference by identifying functionally critical tokens or network components using a goal-gradient metric, then dynamically skipping or pruning unimportant elements while maintaining end-to-end consistency. The approach integrates supervised fine-tuning on compressed data, thereby allowing models to internalize concise reasoning traces and to generate compact, yet semantically coherent outputs at inference with zero runtime pruning overhead.

The approach is distinguished from prior static or generic importance-based compression schemes by:

  • Explicitly measuring the gradient influence of intermediate representations on the task loss (goal-gradient).
  • Dynamically adjusting compression/skipping rates using model uncertainty and local constraints (adaptive skipping).
  • Ensuring local output coherence by enforcing adaptive constraints on maximal consecutive skips.

2. Goal-Gradient Importance (GoGI) Metric

GoGI targets the identification of tokens whose internal representations most directly influence the model's success on the designated task.

Let c=(x1,...,xm)c=(x_1, ..., x_m) denote the CoT token sequence and A=(a1,...,ak)A=(a_1, ..., a_k) the target answer. The answer-prediction loss is

Lans(Ac;θ)=j=1klogPθ(ajc,a<j).L_{ans}(A|c; \theta) = -\sum_{j=1}^k \log P_\theta(a_j | c, a_{<j}).

For each token xtx_t, the GoGI score at a reference layer ll^* is

Gt:=Lans(Ac;θ)/htl1,G_t := \|\partial L_{ans}(A|c; \theta)/\partial h_t^{l^*}\|_1,

where htlh_t^{l^*} is the hidden state for xtx_t at layer ll^*. This 1-norm aggregates the total sensitivity of the final answer loss to infinitesimal perturbations in htlh_t^{l^*}. Token-type weighting can be applied: Gw,t=wtGtG_{w,t} = w_t G_t with learnable wt>0w_t > 0 for discrete token-type classes.

For pruning, a dynamic threshold τt\tau_t is set to retain a local fraction γt\gamma_t of tokens in a validity window IvalidI_{valid}:

τt=Q(1γt)×100({GjjIvalid}),\tau_t = Q_{(1-\gamma_t)\times 100}\left(\{G'_{j}\mid j\in I_{valid}\}\right),

where GtG'_t denotes the downstream importance score.

3. Adaptive Dynamic Skipping (ADS)

ADS governs how aggressively compression or skipping is applied, adapting in real-time to local uncertainty and enforcing coherence constraints.

3.1 Entropy-Driven Rate (EDR)

At position tt, the entropy of the model's predictive distribution is

Ht=vVpt,vlogpt,v,pt,v=Pθ(Xt+1=vxt).H_t = -\sum_{v\in V} p_{t,v} \log p_{t,v}, \quad p_{t,v} = P_\theta(X_{t+1}=v|x_{\leq t}).

A learned normalization maps HtH_t to H^t[0,1]\hat{H}_t\in[0, 1]. The local retention rate

γt=clip[γabsmin,γabsmax](γmin+(γmaxγmin)H^t).\gamma_t = clip_{[\gamma_{abs\,min}, \gamma_{abs\,max}]}\left(\gamma_{min} + (\gamma_{max}-\gamma_{min})\cdot \hat{H}_t\right).

3.2 Adaptive N-Constraint (ANC)

To prevent loss of local coherence from excessive consecutive skips, a windowed mean entropy Hˉt,W\bar{H}_{t,W} over WW tokens is normalized as Hˉ^t,W\hat{\bar{H}}_{t,W}. Then the adaptive skip constraint is

Nt=clip[Nmin,Nmax](Nmin+(NmaxNmin)(1Hˉ^t,W))+0.5,N_t = \left\lfloor clip_{[N_{min}, N_{max}]}\left(N_{min} + (N_{max} - N_{min}) \cdot (1 - \hat{\bar{H}}_{t,W})\right) + 0.5 \right\rfloor,

with tighter constraints (smaller NtN_t) under high uncertainty.

3.3 Integrated Pruning Decision

A running counter Ct1C_{t-1} counts consecutive previous prunes. Token xtx_t is retained if GtτtG'_t \geq \tau_t, or required by the ANC:

Kt=1(Gtτt)        [1(Gt<τt)1(Ct1+1Nt)]K_t = \mathbb{1}(G'_t\geq \tau_t)\;\;\vee\;\;[\mathbb{1}(G'_t<\tau_t) \wedge \mathbb{1}(C_{t-1}+1 \geq N_t)]

CtC_t is incremented if Kt=0K_t=0.

4. Model Training and Integration

Compressed traces ccomp=(xtKt=1)c_{comp}' = (x_t | K_t=1) are produced offline by ADS. The base LLM MθM_\theta is then fine-tuned on triplets (problem, ccompc_{comp}', answer) using the same answer loss Lans(Accomp;θ)L_{ans}(A|c_{comp}';\theta). No explicit regularization beyond standard weight decay is required. At inference, the model produces concise traces matching the training compression distribution, eliminating runtime pruning overhead (Zhuang et al., 13 May 2025).

LoRA finetuning with rank r=16r=16 or $32$, learning rates 2×1052\times10^{-5}, and BF16 precision is effective for both Gemma-3-Instruct and Qwen2.5-Instruct model families.

5. Empirical Performance and Ablations

Evaluated on AIME (2024/2025), GPQA, and GSM8K:

  • Token reduction: 45–60% (token ratio \approx 0.4–0.6)
  • Inference speedups: 1.6–2.0×\times (single RTX 4090, greedy decoding)
  • Reasoning accuracy drop: 0.9\leq 0.9 percentage points (AIME25); accuracy increase (+0.3pp) on GSM8K.
  • Outperforms static baselines (TokenSkip, Spiritft, C3ot) in the efficiency–accuracy trade-off.

Ablation studies show:

  • Removing ANC or EDR moderately reduces accuracy and, in some easy tasks, increases speedup.
  • Removing dynamic skipping (static γ0.5\gamma \approx 0.5) increases speedup but with larger accuracy losses.
  • Omitting GoGI for generic importance metrics further degrades accuracy, establishing GoGI's utility.

Token class retention patterns reveal >90% retention for numerals/operators but only \sim23% for general-language tokens (vs. 75% and 31% under static TokenSkip), highlighting the functional targeting enabled by GoGI.

Cross-model generalization is robust: both small (1B) and large (12B) models and Qwen2.5-Instruct (3B, 7B) achieve 1.5–2.0×\times speedups with \leq1pp accuracy drop.

Adaptive GoGI-Skip can be situated among a wider ecosystem of adaptive computation and skip-based frameworks:

  • Language Modeling Skip Mechanisms: SkipLayer applies a per-token, per-layer binary router based on input-dependent statistics, regulated by auxiliary loss to control average activation, and using Gumbel-Softmax for differentiability (Zeng et al., 2023). Loosely, SkipLayer becomes a special case of GoGI-Skip with binary gating, fixed global budget, and layer-wise routing. By generalizing routing to token groups and allowing for cost-sensitive, input-dependent gating, the full GoGI-Skip approach encompasses and subsumes SkipLayer mechanisms.
  • Depth-Adapted Architectures: Adaptive Depth Networks with Skippable Sub-Paths decompose residual stages into always-on and refinable branches, leveraging self-distillation for fast and robust sub-network selection (Kang et al., 2023). These frameworks anchor the theoretical rationale for skip selection with bounds on loss increase via Taylor expansion around the skipped representation and demonstrate empirical gains in accuracy-efficiency Pareto frontiers.
  • Temporal Abstraction: Adaptive Skip Intervals (ASI) for recurrent models employ variable skip intervals determined by task and prediction easiness (Neitz et al., 2018), closely paralleling dynamic local compression in GoGI-Skip and showing 2–5×\times computational savings with improved accuracy in synthetic sequential domains.

7. Limitations, Variants, and Future Directions

Offline computation of GoGI scores is expensive, with \sim28 GPU-hours required for 7.5K samples on a 4B model (Zhuang et al., 13 May 2025). Layer selection for evaluating gradients requires preliminary tuning, but results are robust within moderate range.

Potential advances include:

  • End-to-end RL-based learning of skipping policies integrated with answer generation, where reward is the task success penalized by output length.
  • Application to LLM-generated (rather than only human/labeled) CoT traces.
  • Extension to multimodal settings, where richer local coherence constraints may require advanced metrics or skipping mechanisms.

A plausible implication is that, as LLMs and vision architectures scale, further unification of goal-gradient token/module scoring with adaptive skip/policy controllers under a single optimization regime will continue to improve efficiency-accuracy trade-offs.

8. Summary Table: Distinguishing Features of Representative Adaptive GoGI-Skip Methods

Framework Importance Signal Skipping Policy Compression Control
GoGI-Skip (CoT) (Zhuang et al., 13 May 2025) Gradient wrt answer loss Entropy-driven dynamic skip Adaptive local retention, ANC
SkipLayer (Zeng et al., 2023) Input-dependent router Binary Gumbel-Softmax gate Global skip-rate target P
ADN (Vision) (Kang et al., 2023) Stage grouping, distillation Static/inferred per stage Sub-path enumeration
ASI (Temporal) (Neitz et al., 2018) Prediction error, task loss Skip-interval policy π Dynamic replay horizon

This synthesis delineates Adaptive GoGI-Skip as the current apex of functionally grounded, uncertainty- and structure-aware dynamic compression, achieving state-of-the-art efficiency gains in CoT LLMs and holding general promise across model architectures and reasoning domains.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Adaptive GoGI-Skip.