Adaptive GoGI-Skip: Dynamic Compression Framework

Updated 18 December 2025

Adaptive GoGI-Skip is a neural framework that applies goal-gradient metrics to identify critical tokens, enabling dynamic compression of reasoning traces.
It combines entropy-driven retention and adaptive skipping to maintain output coherence while significantly reducing token redundancy.
Empirical results reveal up to 2× speedup with minimal accuracy loss, proving its effectiveness in chain-of-thought and vision model adaptations.

Adaptive GoGI-Skip refers to a family of neural computation frameworks that combine goal-oriented, gradient-based token (or submodule) importance metrics with dynamic, uncertainty-aware skipping mechanisms to achieve efficient, adaptive sequence compression and/or computation allocation, particularly in LLMs engaged in complex reasoning tasks. The methodology is epitomized in Chain-of-Thought (CoT) reasoning compression but is extensible to depth adaptation in vision models and dynamic skip-intervals in temporal models. This article provides a comprehensive treatment based on the state-of-the-art instantiation in "Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping" (Zhuang et al., 13 May 2025) as well as related advances in adaptive computation.

1. Conceptual Foundations

Adaptive GoGI-Skip realizes selective compression and adaptive inference by identifying functionally critical tokens or network components using a goal-gradient metric, then dynamically skipping or pruning unimportant elements while maintaining end-to-end consistency. The approach integrates supervised fine-tuning on compressed data, thereby allowing models to internalize concise reasoning traces and to generate compact, yet semantically coherent outputs at inference with zero runtime pruning overhead.

The approach is distinguished from prior static or generic importance-based compression schemes by:

Explicitly measuring the gradient influence of intermediate representations on the task loss (goal-gradient).
Dynamically adjusting compression/skipping rates using model uncertainty and local constraints (adaptive skipping).
Ensuring local output coherence by enforcing adaptive constraints on maximal consecutive skips.

2. Goal-Gradient Importance (GoGI) Metric

GoGI targets the identification of tokens whose internal representations most directly influence the model's success on the designated task.

Let $c=(x_1, ..., x_m)$ denote the CoT token sequence and $A=(a_1, ..., a_k)$ the target answer. The answer-prediction loss is

$L_{ans}(A|c; \theta) = -\sum_{j=1}^k \log P_\theta(a_j | c, a_{<j}).$

For each token $x_t$ , the GoGI score at a reference layer $l^*$ is

$G_t := \|\partial L_{ans}(A|c; \theta)/\partial h_t^{l^*}\|_1,$

where $h_t^{l^*}$ is the hidden state for $x_t$ at layer $l^*$ . This 1-norm aggregates the total sensitivity of the final answer loss to infinitesimal perturbations in $h_t^{l^*}$ . Token-type weighting can be applied: $G_{w,t} = w_t G_t$ with learnable $w_t > 0$ for discrete token-type classes.

For pruning, a dynamic threshold $\tau_t$ is set to retain a local fraction $\gamma_t$ of tokens in a validity window $I_{valid}$ :

$\tau_t = Q_{(1-\gamma_t)\times 100}\left(\{G'_{j}\mid j\in I_{valid}\}\right),$

where $G'_t$ denotes the downstream importance score.

3. Adaptive Dynamic Skipping (ADS)

ADS governs how aggressively compression or skipping is applied, adapting in real-time to local uncertainty and enforcing coherence constraints.

3.1 Entropy-Driven Rate (EDR)

At position $t$ , the entropy of the model's predictive distribution is

$H_t = -\sum_{v\in V} p_{t,v} \log p_{t,v}, \quad p_{t,v} = P_\theta(X_{t+1}=v|x_{\leq t}).$

A learned normalization maps $H_t$ to $\hat{H}_t\in[0, 1]$ . The local retention rate

$\gamma_t = clip_{[\gamma_{abs\,min}, \gamma_{abs\,max}]}\left(\gamma_{min} + (\gamma_{max}-\gamma_{min})\cdot \hat{H}_t\right).$

3.2 Adaptive N-Constraint (ANC)

To prevent loss of local coherence from excessive consecutive skips, a windowed mean entropy $\bar{H}_{t,W}$ over $W$ tokens is normalized as $\hat{\bar{H}}_{t,W}$ . Then the adaptive skip constraint is

$N_t = \left\lfloor clip_{[N_{min}, N_{max}]}\left(N_{min} + (N_{max} - N_{min}) \cdot (1 - \hat{\bar{H}}_{t,W})\right) + 0.5 \right\rfloor,$

with tighter constraints (smaller $N_t$ ) under high uncertainty.

3.3 Integrated Pruning Decision

A running counter $C_{t-1}$ counts consecutive previous prunes. Token $x_t$ is retained if $G'_t \geq \tau_t$ , or required by the ANC:

$K_t = \mathbb{1}(G'_t\geq \tau_t)\;\;\vee\;\;[\mathbb{1}(G'_t<\tau_t) \wedge \mathbb{1}(C_{t-1}+1 \geq N_t)]$

$C_t$ is incremented if $K_t=0$ .

4. Model Training and Integration

Compressed traces $c_{comp}' = (x_t | K_t=1)$ are produced offline by ADS. The base LLM $M_\theta$ is then fine-tuned on triplets (problem, $c_{comp}'$ , answer) using the same answer loss $L_{ans}(A|c_{comp}';\theta)$ . No explicit regularization beyond standard weight decay is required. At inference, the model produces concise traces matching the training compression distribution, eliminating runtime pruning overhead (Zhuang et al., 13 May 2025).

LoRA finetuning with rank $r=16$ or $32$, learning rates $2\times10^{-5}$ , and BF16 precision is effective for both Gemma-3-Instruct and Qwen2.5-Instruct model families.

5. Empirical Performance and Ablations

Evaluated on AIME (2024/2025), GPQA, and GSM8K:

Token reduction: 45–60% (token ratio $\approx$ 0.4–0.6)
Inference speedups: 1.6–2.0 $\times$ (single RTX 4090, greedy decoding)
Reasoning accuracy drop: $\leq 0.9$ percentage points (AIME25); accuracy increase (+0.3pp) on GSM8K.
Outperforms static baselines (TokenSkip, Spiritft, C3ot) in the efficiency–accuracy trade-off.

Ablation studies show:

Removing ANC or EDR moderately reduces accuracy and, in some easy tasks, increases speedup.
Removing dynamic skipping (static $\gamma \approx 0.5$ ) increases speedup but with larger accuracy losses.
Omitting GoGI for generic importance metrics further degrades accuracy, establishing GoGI's utility.

Token class retention patterns reveal >90% retention for numerals/operators but only $\sim$ 23% for general-language tokens (vs. 75% and 31% under static TokenSkip), highlighting the functional targeting enabled by GoGI.

Cross-model generalization is robust: both small (1B) and large (12B) models and Qwen2.5-Instruct (3B, 7B) achieve 1.5–2.0 $\times$ speedups with $\leq$ 1pp accuracy drop.

Adaptive GoGI-Skip can be situated among a wider ecosystem of adaptive computation and skip-based frameworks:

Language Modeling Skip Mechanisms: SkipLayer applies a per-token, per-layer binary router based on input-dependent statistics, regulated by auxiliary loss to control average activation, and using Gumbel-Softmax for differentiability (Zeng et al., 2023). Loosely, SkipLayer becomes a special case of GoGI-Skip with binary gating, fixed global budget, and layer-wise routing. By generalizing routing to token groups and allowing for cost-sensitive, input-dependent gating, the full GoGI-Skip approach encompasses and subsumes SkipLayer mechanisms.
Depth-Adapted Architectures: Adaptive Depth Networks with Skippable Sub-Paths decompose residual stages into always-on and refinable branches, leveraging self-distillation for fast and robust sub-network selection (Kang et al., 2023). These frameworks anchor the theoretical rationale for skip selection with bounds on loss increase via Taylor expansion around the skipped representation and demonstrate empirical gains in accuracy-efficiency Pareto frontiers.
Temporal Abstraction: Adaptive Skip Intervals (ASI) for recurrent models employ variable skip intervals determined by task and prediction easiness (Neitz et al., 2018), closely paralleling dynamic local compression in GoGI-Skip and showing 2–5 $\times$ computational savings with improved accuracy in synthetic sequential domains.

7. Limitations, Variants, and Future Directions

Offline computation of GoGI scores is expensive, with $\sim$ 28 GPU-hours required for 7.5K samples on a 4B model (Zhuang et al., 13 May 2025). Layer selection for evaluating gradients requires preliminary tuning, but results are robust within moderate range.

Potential advances include:

End-to-end RL-based learning of skipping policies integrated with answer generation, where reward is the task success penalized by output length.
Application to LLM-generated (rather than only human/labeled) CoT traces.
Extension to multimodal settings, where richer local coherence constraints may require advanced metrics or skipping mechanisms.

A plausible implication is that, as LLMs and vision architectures scale, further unification of goal-gradient token/module scoring with adaptive skip/policy controllers under a single optimization regime will continue to improve efficiency-accuracy trade-offs.

8. Summary Table: Distinguishing Features of Representative Adaptive GoGI-Skip Methods

Framework	Importance Signal	Skipping Policy	Compression Control
GoGI-Skip (CoT) (Zhuang et al., 13 May 2025)	Gradient wrt answer loss	Entropy-driven dynamic skip	Adaptive local retention, ANC
SkipLayer (Zeng et al., 2023)	Input-dependent router	Binary Gumbel-Softmax gate	Global skip-rate target P
ADN (Vision) (Kang et al., 2023)	Stage grouping, distillation	Static/inferred per stage	Sub-path enumeration
ASI (Temporal) (Neitz et al., 2018)	Prediction error, task loss	Skip-interval policy π	Dynamic replay horizon

This synthesis delineates Adaptive GoGI-Skip as the current apex of functionally grounded, uncertainty- and structure-aware dynamic compression, achieving state-of-the-art efficiency gains in CoT LLMs and holding general promise across model architectures and reasoning domains.

PDF Markdown Chat (Pro)

References (4)

Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping (2025)

Learning to Skip for Language Modeling (2023)

Adaptive Depth Networks with Skippable Sub-Paths (2023)

Adaptive Skip Intervals: Temporal Abstraction for Recurrent Dynamical Models (2018)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Adaptive GoGI-Skip.