Relative Convergence Scores (RCS)
- Relative Convergence Scores (RCS) are dual-purpose metrics used to balance dynamic task weighting in multi-task LLM fine-tuning and to select semantically central candidate answers during inference.
- In multi-task training, RCS leverages validation-loss slopes and softmax normalization to dynamically recalibrate task weights, preventing overfitting and ensuring balanced convergence rates.
- For inference aggregation, RCS computes Euclidean distances from candidate embeddings to a semantic center, favoring responses that best reflect collective consensus.
Relative Convergence Scores (RCS) refer to two distinct but independently named metrics in contemporary machine learning literature, each tailored for a specific domain: (1) dynamic task weighting during multi-task fine-tuning of LLMs (Gong et al., 2024), and (2) aggregation of LLM candidate responses via geometric consensus during inference-time answer selection (Nguyen et al., 14 Apr 2026). While both share the same acronym, their formal definitions, computational methodology, and use cases are entirely context-dependent.
1. Definition and Overview
In multitask LLM fine-tuning, Relative Convergence Score denotes a dynamically computed, validation-loss-based quantity used by CoBa (Convergence Balancer) to balance convergence rates of distinct tasks. It quantifies, for each task at each validation checkpoint, the relative speed of convergence with respect to other tasks and informs the reallocation of training-loss weights to harmonize learning progress across all tasks (Gong et al., 2024).
In inference settings, particularly for best-of- answer selection, Radial Consensus Score is a geometric metric. Given a collection of candidate answer embeddings, it represents the Euclidean distance from a candidate to the weighted Fréchet mean ("semantic center") of all answer embeddings, prioritizing responses that are semantically central to the group (Nguyen et al., 14 Apr 2026).
2. Formal Definitions
CoBa’s Relative Convergence Score in Multitask Finetuning
Let denote the number of tasks, the validation loss of task at iteration , and the loss ratio relative to initialization. The -step windowed history of is linearly regressed to produce the per-task slope . RCS is then defined as:
$\RCS_i(t) = \mathrm{softmax}_i \left( \frac{K \cdot \alpha_i(t)}{\sum_{j=1}^K|\alpha_j(t)|} \right)$
with the softmax over the task index 0, ensuring 1 (Gong et al., 2024).
Radial Consensus Score for Best-of-2 Candidate Aggregation
Given 3 candidate responses 4 with embeddings 5, and normalized weights 6 (7), the semantic center is
8
The Radial Consensus Score for candidate 9 is
0
Lower scores correspond to answers that are more semantically central to the population (Nguyen et al., 14 Apr 2026).
3. Algorithmic Procedure
RCS in CoBa Training Loop
- For each step 1 and each task 2:
- Update the fixed-length window of 3 ratios.
- Fit a least-squares line to compute slope 4.
- Normalize slopes and apply softmax as in the formal definition to yield taskwise weights.
- Integrate with Absolute Convergence Score (ACS) and Divergence Factor (DF) to compose final weights:
5
A warm-up phase assigns uniform task weights for 6.
See pseudocode in [(Gong et al., 2024), Algorithm 1].
Radial Consensus Score for Answer Selection
Sample 7 answers 8 from an LLM for input 9.
Compute embeddings 0.
Select weights 1 by either uniform, frequency, or probability scheme:
- Uniform: 2.
- Frequency: 3, with 4 the count of 5.
- Probability: 6 proportional to 7.
- Compute semantic center 8.
- For each 9, calculate 0.
- Select 1 as the best candidate (Nguyen et al., 14 Apr 2026).
- Total complexity is 2, dominated by the embedding step.
4. Intuition, Interpretation, and Theoretical Rationale
In multitask supervision, RCS suppresses the training weight of rapidly converging ("fast") tasks and boosts weights for lagging ("slow") ones, enforcing uniform descent rates across validation losses and minimizing the risk of task "collapse" or overfitting. This balancing is made robust by normalization and softmax, rendering the method insensitive to scale and number of tasks (Gong et al., 2024).
In answer aggregation, Radial Consensus Score geometrically emphasizes semantic similarity: candidates close to the semantic center—corresponding to high-consensus answers regardless of surface-form plurality—are preferred. Unlike majority voting, which is brittle to lexical diversity, RCS naturally clusters near-duplicates and excludes geometric outliers, achieving robustness in the presence of noise and sampling artifacts (Nguyen et al., 14 Apr 2026).
5. Experimental Characterization and Ablative Insights
CoBa with RCS in Multi-task LLM Finetuning
- On code completion benchmarks, the combined RCS+ACS+DF weighting in CoBa achieves a Pass@1 of 29.4% versus 28.7% and 28.1% with ablations (removing RCS or ACS, respectively). All three components are necessary for optimal performance.
- RCS dynamically assigns higher scores to slowest-converging tasks prior to divergence, maintaining convergence equilibrium across tasks [(Gong et al., 2024), Table 8, Figure 1(c)].
Radial Consensus Score in Best-of-3 Selection
- Across seven benchmarks, RCS (especially the uniform weighted variant) consistently outperforms majority voting and probability-only approaches, with gains increasing at higher 4 (sampling budget).
- On Qwen2.5-3B (N=10), RCS5 achieves 57.0% selection accuracy versus 51.6% for self-consistency voting.
- RCS methods continue to improve as 6 scales, whereas majority voting can plateau or degrade.
- Black-box applicability is demonstrated on API models, and in multi-agent debate settings, RCS yields quantifiable increases in consensus selection accuracy (Nguyen et al., 14 Apr 2026).
6. Limitations and Practical Considerations
For CoBa’s RCS:
- Early-stage slope estimates are susceptible to noise, mitigated via a warm-up period and sufficiently large window sizes.
- Linear regression on validation-loss ratios assumes quasi-linear convergence behavior; strong nonlinearity or stationary losses can degrade the diagnosticity of 7.
- Temperature 8 (in DF), slope window 9, and warm-up 0 are hyperparameters—robust defaults exist but may require empirical tuning for new task distributions.
- RCS does not resolve deeper task-level gradient conflicts or address multi-modality; it functions primarily as a pacing/heavy-tail suppression mechanism (Gong et al., 2024).
For Radial Consensus Score:
- Embedding model fidelity can, in principle, impact semantic center accuracy; in practice, choice of embedding across widely used models (MiniLM, RoBERTa) yields stable results.
- Using embeddings for entire reasoning trajectories rather than final answers can collapse semantic separation and degrade consensus ranking.
- Black-box (API) applicability can be limited by the cost or feasibility of obtaining embeddings for large 1; batched encoding is typically used to limit overhead (Nguyen et al., 14 Apr 2026).
7. Comparative Position and Broader Implications
Relative Convergence Score, as formulated in both CoBa and Radial Consensus Score frameworks, introduces robust alternatives to conventional heuristic or frequency-based balancing in both multitask training and answer aggregation pipelines.
| Domain | Metric Name | Definition Basis | Core Function |
|---|---|---|---|
| Multi-task Finetuning | Relative Convergence Score (CoBa) | Normalized validation-loss slopes + softmax | Convergence pacing, dynamic task weight adjustment |
| Inference Aggregation | Radial Consensus Score (Answer selection) | Embedding distance from semantic center | Geometric consensus, robust answer selection |
In both settings, the shift from surface-form or scale-unaware methods to either convergence-normalized or semantically-aware metrics addresses known pathologies: task starvation and divergence in multitask LLMs (Gong et al., 2024); brittleness and majority failure in LLM inference aggregation (Nguyen et al., 14 Apr 2026). A plausible implication is that geometric and relative-normalization-based scoring may become standard components for balancing and aggregation methodologies in future LLM-oriented workflows.