RankTuner: Adaptive Token Reweighting
- RankTuner is a token-level reweighting method that fuses token probability with predictive entropy via the Relative Rank Indicator (RRI) for adaptive supervised fine-tuning.
- It selectively emphasizes under-learned tokens and attenuates over-penalization in uncertain contexts, leading to marked improvements in math reasoning and code generation.
- The method leverages a probability–entropy calibration approach that maintains computational efficiency and outperforms traditional single-factor reweighting schemes.
RankTuner is a token-level reweighting method for adaptive supervised fine-tuning of LLMs. It introduces a probability–entropy calibration signal—the @@@@2@@@@ (RRI)—which fuses ground-truth token probability with predictive entropy to provide an elastic, context-sensitive scaling factor for loss reweighting. By leveraging both probability and entropy, RankTuner selectively emphasizes under-learned tokens while attenuating over-penalization at intrinsically uncertain sequence positions. The method yields consistent improvements on mathematical reasoning and code generation benchmarks, outperforming both probability-only and entropy-only token reweighting schemes (Yu et al., 2 Feb 2026).
1. Foundations: Relative Rank Indicator and Scale
The core construct underlying RankTuner is the Relative Rank Indicator (RRI), which measures the deviation of the ground-truth token’s probability rank from its expectation under the model’s predictive distribution at each decoding step :
- Let denote the model’s predicted probability for token .
- The ground-truth token probability is .
- The realized rank is
where lower rank means higher model confidence in the correct token.
- The expected rank is
with the -th largest probability.
Ranks are mapped through a concave decay and re-exponentiated: By the Cauchy mean-value theorem, there exists such that
The Relative Scale (), used for loss reweighting, is the inverse of : where is the entropy, and is an analytic lower bound on based on entropy: with .
2. Probability–Entropy Fusion Mechanism
Traditional reweighting relies on either probability or entropy in isolation:
- Probability-only (prob-dominant) schemes reflect model alignment with ground-truth targets, but ignore structural or linguistically flexible regions in the output, over-penalizing inherently uncertain predictions.
- Entropy-only (entropy-dominant) approaches may misclassify noisy or easily replaceable tokens (such as fillers) as critical, failing to capture downstream task alignment.
RRI addresses both shortcomings by comparing the realized rank to its expectation . This comparison up-weights tokens that are both incorrectly predicted (high ) and appear in low-uncertainty contexts (low ), and systematically down-weights tokens in positions of high model uncertainty or flexibility. This dual calibration suppresses error reinforcement for ambiguous or noisy contexts and focuses updates on genuinely misaligned predictions.
3. Derivation and Implementation of the Scaling Algorithm
RankTuner starts from the weighted cross-entropy loss: It adapts the token weight to , where is the Relative Scale and is typically set to for math reasoning and $1$ for general tasks.
The algorithm comprises:
- Compute logits , apply softmax to obtain probabilities .
- For each token :
- Obtain
- Determine (number of logits with )
- Compute entropy
- Calculate
- Set , and
- Compute
- Form
- Final objective: , followed by gradient update.
The overall computational overhead matches that of standard cross-entropy loss, requiring per token due to the vectorized calculation of ranks, entropy, and maximum.
4. Experimental Evaluation and Benchmarking
RankTuner was evaluated on mathematical reasoning (NuminaMath-CoT-10k, various math benchmarks) and code generation (Evol-Instruct-Code-80k, HumanEval). Backbones included Qwen2.5-Math (1.5B, 7B), Qwen3 (4B, 8B), Llama-3.1-8B, and Qwen2.5-Coder-3B/7B.
Performance relative to standard and established baselines (SFT, EAFT, OverTone, DFT, TALR) is summarized below:
| Task/Model | Baseline Pass@1 | RankTuner Pass@1 | Δ (Absolute Improvement) |
|---|---|---|---|
| MATH-OAI (Qwen2.5-7B) | 31.8 | 68.6 | +36.8 |
| Minerva Math (Qwen2.5-7B) | 7.6 | 33.3 | +25.7 |
| OlympiadBench (Qwen2.5-7B) | 9.5 | 32.9 | +23.4 |
| AMC23 (Qwen2.5-7B) | 20.5 | 44.5 | +24.1 |
| ARC-C (Qwen2.5-7B, OOD) | 13.5 | 53.6 | +40.1 |
| HumanEval (Qwen2.5-Coder-7B) | 61.1 | 62.7 | +1.6 |
Additional findings:
- RankTuner maintains or improves multi-sample accuracy (e.g., Pass@16) across benchmarks.
- Out-of-distribution generalization is enhanced, as evidenced by performance on ARC-C and GPQA.
- For smaller code models (3B), RankTuner best preserves original performance during fine-tuning.
5. Comparison to Previous Reweighting Approaches
Previous reweighting methods are dominated by probability-only (e.g., OverTone, DFT, TALR) or entropy-only (e.g., EAFT) strategies. These methods do not calibrate per-token weight based on combined uncertainty and target alignment and can misallocate training emphasis:
- Probability-only methods risk overfitting to noisy positions by treating all low-probability tokens as equivalently important.
- Entropy-only weighting erroneously highlights tokens with high prediction uncertainty, irrespective of their downstream significance.
RankTuner’s joint calibration via RRI and Relative Scale facilitates a balanced, elastic reweighting. Empirical results consistently show that this blended signal delivers higher accuracy for math reasoning, better transfer to out-of-distribution (OOD) contexts, and robust code generation stability compared to both probability-only and entropy-only reweighting (Yu et al., 2 Feb 2026).
6. Significance and Applications
RankTuner constitutes a principled adaptive fine-tuning scheme for high-uncertainty generation tasks requiring precise per-token supervision. Its approach is particularly advantageous for:
- Math and logic reasoning, where token-wise criticality is context-dependent and uncertainty-aware adjustment prevents error amplification in solution steps.
- Code generation tasks demanding high performance stability under supervised task adaptation.
A plausible implication is that probability–entropy calibration, as formalized in the relative rank framework, forms a foundation for more expressive sample-adaptive loss modulation across generative modeling domains.
7. Limitations and Open Questions
While RankTuner maintains computational parity with standard cross-entropy, it relies on the full softmax distribution for calculating ranks and entropy, which may become a bottleneck for ultra-large vocabularies. The method’s calibration parameters are analytic and do not require additional meta-optimization; however, its effectiveness outside of mathematics, code, or similarly structured domains remains to be systematically characterized.
Interpretation of the Relative Scale’s elasticity and its interaction with curriculum or progressive training strategies could be explored in further research. The generality of the approach for semantic-heavy or noisy natural language domains is an open question.
For a detailed derivation and further empirical results, see "Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning" (Yu et al., 2 Feb 2026).