Relative Convergence Scores (RCS)

Updated 16 May 2026

Relative Convergence Scores (RCS) are dual-purpose metrics used to balance dynamic task weighting in multi-task LLM fine-tuning and to select semantically central candidate answers during inference.
In multi-task training, RCS leverages validation-loss slopes and softmax normalization to dynamically recalibrate task weights, preventing overfitting and ensuring balanced convergence rates.
For inference aggregation, RCS computes Euclidean distances from candidate embeddings to a semantic center, favoring responses that best reflect collective consensus.

Relative Convergence Scores (RCS) refer to two distinct but independently named metrics in contemporary machine learning literature, each tailored for a specific domain: (1) dynamic task weighting during multi-task fine-tuning of LLMs (Gong et al., 2024), and (2) aggregation of LLM candidate responses via geometric consensus during inference-time answer selection (Nguyen et al., 14 Apr 2026). While both share the same acronym, their formal definitions, computational methodology, and use cases are entirely context-dependent.

1. Definition and Overview

In multitask LLM fine-tuning, Relative Convergence Score denotes a dynamically computed, validation-loss-based quantity used by CoBa (Convergence Balancer) to balance convergence rates of distinct tasks. It quantifies, for each task at each validation checkpoint, the relative speed of convergence with respect to other tasks and informs the reallocation of training-loss weights to harmonize learning progress across all tasks (Gong et al., 2024).

In inference settings, particularly for best-of- $N$ answer selection, Radial Consensus Score is a geometric metric. Given a collection of candidate answer embeddings, it represents the Euclidean distance from a candidate to the weighted Fréchet mean ("semantic center") of all answer embeddings, prioritizing responses that are semantically central to the group (Nguyen et al., 14 Apr 2026).

2. Formal Definitions

CoBa’s Relative Convergence Score in Multitask Finetuning

Let $K$ denote the number of tasks, $\ell_i^{\mathrm{val}}(\theta;t)$ the validation loss of task $i$ at iteration $t$ , and $\bar\ell_i^{\mathrm{val}}(\theta;t)$ the loss ratio relative to initialization. The $N$ -step windowed history of $\bar\ell_i^{\mathrm{val}}(\theta;t)$ is linearly regressed to produce the per-task slope $\alpha_i(t)$ . RCS is then defined as:

$\RCS_i(t) = \mathrm{softmax}_i \left( \frac{K \cdot \alpha_i(t)}{\sum_{j=1}^K|\alpha_j(t)|} \right)$

with the softmax over the task index $K$ 0, ensuring $K$ 1 (Gong et al., 2024).

Radial Consensus Score for Best-of- $K$ 2 Candidate Aggregation

Given $K$ 3 candidate responses $K$ 4 with embeddings $K$ 5, and normalized weights $K$ 6 ( $K$ 7), the semantic center is

$K$ 8

The Radial Consensus Score for candidate $K$ 9 is

$\ell_i^{\mathrm{val}}(\theta;t)$ 0

Lower scores correspond to answers that are more semantically central to the population (Nguyen et al., 14 Apr 2026).

3. Algorithmic Procedure

RCS in CoBa Training Loop

For each step $\ell_i^{\mathrm{val}}(\theta;t)$ $ℓ_{i}^{val} (θ; t)$ 1 and each task $\ell_i^{\mathrm{val}}(\theta;t)$ $ℓ_{i}^{val} (θ; t)$ 2:
- Update the fixed-length window of $\ell_i^{\mathrm{val}}(\theta;t)$ 3 ratios.
- Fit a least-squares line to compute slope $\ell_i^{\mathrm{val}}(\theta;t)$ 4.
- Normalize slopes and apply softmax as in the formal definition to yield taskwise weights.
- Integrate with Absolute Convergence Score (ACS) and Divergence Factor (DF) to compose final weights:
$\ell_i^{\mathrm{val}}(\theta;t)$ 5
A warm-up phase assigns uniform task weights for $\ell_i^{\mathrm{val}}(\theta;t)$ 6.
See pseudocode in [(Gong et al., 2024), Algorithm 1].

Radial Consensus Score for Answer Selection

Sample $\ell_i^{\mathrm{val}}(\theta;t)$ 7 answers $\ell_i^{\mathrm{val}}(\theta;t)$ 8 from an LLM for input $\ell_i^{\mathrm{val}}(\theta;t)$ 9.
Compute embeddings $i$ 0.
Select weights $i$ 1 by either uniform, frequency, or probability scheme:
- Uniform: $i$ 2.
- Frequency: $i$ 3, with $i$ 4 the count of $i$ 5.
- Probability: $i$ 6 proportional to $i$ 7.
Compute semantic center $i$ 8.
For each $i$ 9, calculate $t$ 0.
Select $t$ 1 as the best candidate (Nguyen et al., 14 Apr 2026).
Total complexity is $t$ 2, dominated by the embedding step.

4. Intuition, Interpretation, and Theoretical Rationale

In multitask supervision, RCS suppresses the training weight of rapidly converging ("fast") tasks and boosts weights for lagging ("slow") ones, enforcing uniform descent rates across validation losses and minimizing the risk of task "collapse" or overfitting. This balancing is made robust by normalization and softmax, rendering the method insensitive to scale and number of tasks (Gong et al., 2024).

In answer aggregation, Radial Consensus Score geometrically emphasizes semantic similarity: candidates close to the semantic center—corresponding to high-consensus answers regardless of surface-form plurality—are preferred. Unlike majority voting, which is brittle to lexical diversity, RCS naturally clusters near-duplicates and excludes geometric outliers, achieving robustness in the presence of noise and sampling artifacts (Nguyen et al., 14 Apr 2026).

5. Experimental Characterization and Ablative Insights

CoBa with RCS in Multi-task LLM Finetuning

On code completion benchmarks, the combined RCS+ACS+DF weighting in CoBa achieves a Pass@1 of 29.4% versus 28.7% and 28.1% with ablations (removing RCS or ACS, respectively). All three components are necessary for optimal performance.
RCS dynamically assigns higher scores to slowest-converging tasks prior to divergence, maintaining convergence equilibrium across tasks [(Gong et al., 2024), Table 8, Figure 1(c)].

Radial Consensus Score in Best-of- $t$ 3 Selection

Across seven benchmarks, RCS (especially the uniform weighted variant) consistently outperforms majority voting and probability-only approaches, with gains increasing at higher $t$ 4 (sampling budget).
On Qwen2.5-3B (N=10), RCS $t$ 5 achieves 57.0% selection accuracy versus 51.6% for self-consistency voting.
RCS methods continue to improve as $t$ 6 scales, whereas majority voting can plateau or degrade.
Black-box applicability is demonstrated on API models, and in multi-agent debate settings, RCS yields quantifiable increases in consensus selection accuracy (Nguyen et al., 14 Apr 2026).

6. Limitations and Practical Considerations

For CoBa’s RCS:

Early-stage slope estimates are susceptible to noise, mitigated via a warm-up period and sufficiently large window sizes.
Linear regression on validation-loss ratios assumes quasi-linear convergence behavior; strong nonlinearity or stationary losses can degrade the diagnosticity of $t$ 7.
Temperature $t$ 8 (in DF), slope window $t$ 9, and warm-up $\bar\ell_i^{\mathrm{val}}(\theta;t)$ 0 are hyperparameters—robust defaults exist but may require empirical tuning for new task distributions.
RCS does not resolve deeper task-level gradient conflicts or address multi-modality; it functions primarily as a pacing/heavy-tail suppression mechanism (Gong et al., 2024).

For Radial Consensus Score:

Embedding model fidelity can, in principle, impact semantic center accuracy; in practice, choice of embedding across widely used models (MiniLM, RoBERTa) yields stable results.
Using embeddings for entire reasoning trajectories rather than final answers can collapse semantic separation and degrade consensus ranking.
Black-box (API) applicability can be limited by the cost or feasibility of obtaining embeddings for large $\bar\ell_i^{\mathrm{val}}(\theta;t)$ 1; batched encoding is typically used to limit overhead (Nguyen et al., 14 Apr 2026).

7. Comparative Position and Broader Implications

Relative Convergence Score, as formulated in both CoBa and Radial Consensus Score frameworks, introduces robust alternatives to conventional heuristic or frequency-based balancing in both multitask training and answer aggregation pipelines.

Domain	Metric Name	Definition Basis	Core Function
Multi-task Finetuning	Relative Convergence Score (CoBa)	Normalized validation-loss slopes + softmax	Convergence pacing, dynamic task weight adjustment
Inference Aggregation	Radial Consensus Score (Answer selection)	Embedding distance from semantic center	Geometric consensus, robust answer selection

In both settings, the shift from surface-form or scale-unaware methods to either convergence-normalized or semantically-aware metrics addresses known pathologies: task starvation and divergence in multitask LLMs (Gong et al., 2024); brittleness and majority failure in LLM inference aggregation (Nguyen et al., 14 Apr 2026). A plausible implication is that geometric and relative-normalization-based scoring may become standard components for balancing and aggregation methodologies in future LLM-oriented workflows.

Markdown Report Issue Upgrade to Chat

References (2)

CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models (2024)

Beyond Majority Voting: Efficient Best-Of-N with Radial Consensus Score (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Relative Convergence Scores (RCS).

Relative Convergence Scores (RCS)

1. Definition and Overview

2. Formal Definitions

CoBa’s Relative Convergence Score in Multitask Finetuning

Radial Consensus Score for Best-of- $K$ 2 Candidate Aggregation

3. Algorithmic Procedure

RCS in CoBa Training Loop

Radial Consensus Score for Answer Selection

4. Intuition, Interpretation, and Theoretical Rationale

5. Experimental Characterization and Ablative Insights

CoBa with RCS in Multi-task LLM Finetuning

Radial Consensus Score in Best-of- $t$ 3 Selection

6. Limitations and Practical Considerations

7. Comparative Position and Broader Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Relative Convergence Scores (RCS)

1. Definition and Overview

2. Formal Definitions

CoBa’s Relative Convergence Score in Multitask Finetuning

Radial Consensus Score for Best-of-KKK2 Candidate Aggregation

3. Algorithmic Procedure

RCS in CoBa Training Loop

Radial Consensus Score for Answer Selection

4. Intuition, Interpretation, and Theoretical Rationale

5. Experimental Characterization and Ablative Insights

CoBa with RCS in Multi-task LLM Finetuning

Radial Consensus Score in Best-of-ttt3 Selection

6. Limitations and Practical Considerations

7. Comparative Position and Broader Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Radial Consensus Score for Best-of- $K$ 2 Candidate Aggregation

Radial Consensus Score in Best-of- $t$ 3 Selection