Papers
Topics
Authors
Recent
Search
2000 character limit reached

Relative Convergence Scores (RCS)

Updated 16 May 2026
  • Relative Convergence Scores (RCS) are dual-purpose metrics used to balance dynamic task weighting in multi-task LLM fine-tuning and to select semantically central candidate answers during inference.
  • In multi-task training, RCS leverages validation-loss slopes and softmax normalization to dynamically recalibrate task weights, preventing overfitting and ensuring balanced convergence rates.
  • For inference aggregation, RCS computes Euclidean distances from candidate embeddings to a semantic center, favoring responses that best reflect collective consensus.

Relative Convergence Scores (RCS) refer to two distinct but independently named metrics in contemporary machine learning literature, each tailored for a specific domain: (1) dynamic task weighting during multi-task fine-tuning of LLMs (Gong et al., 2024), and (2) aggregation of LLM candidate responses via geometric consensus during inference-time answer selection (Nguyen et al., 14 Apr 2026). While both share the same acronym, their formal definitions, computational methodology, and use cases are entirely context-dependent.

1. Definition and Overview

In multitask LLM fine-tuning, Relative Convergence Score denotes a dynamically computed, validation-loss-based quantity used by CoBa (Convergence Balancer) to balance convergence rates of distinct tasks. It quantifies, for each task at each validation checkpoint, the relative speed of convergence with respect to other tasks and informs the reallocation of training-loss weights to harmonize learning progress across all tasks (Gong et al., 2024).

In inference settings, particularly for best-of-NN answer selection, Radial Consensus Score is a geometric metric. Given a collection of candidate answer embeddings, it represents the Euclidean distance from a candidate to the weighted Fréchet mean ("semantic center") of all answer embeddings, prioritizing responses that are semantically central to the group (Nguyen et al., 14 Apr 2026).

2. Formal Definitions

CoBa’s Relative Convergence Score in Multitask Finetuning

Let KK denote the number of tasks, ℓival(θ;t)\ell_i^{\mathrm{val}}(\theta;t) the validation loss of task ii at iteration tt, and ℓˉival(θ;t)\bar\ell_i^{\mathrm{val}}(\theta;t) the loss ratio relative to initialization. The NN-step windowed history of ℓˉival(θ;t)\bar\ell_i^{\mathrm{val}}(\theta;t) is linearly regressed to produce the per-task slope αi(t)\alpha_i(t). RCS is then defined as:

$\RCS_i(t) = \mathrm{softmax}_i \left( \frac{K \cdot \alpha_i(t)}{\sum_{j=1}^K|\alpha_j(t)|} \right)$

with the softmax over the task index KK0, ensuring KK1 (Gong et al., 2024).

Radial Consensus Score for Best-of-KK2 Candidate Aggregation

Given KK3 candidate responses KK4 with embeddings KK5, and normalized weights KK6 (KK7), the semantic center is

KK8

The Radial Consensus Score for candidate KK9 is

ℓival(θ;t)\ell_i^{\mathrm{val}}(\theta;t)0

Lower scores correspond to answers that are more semantically central to the population (Nguyen et al., 14 Apr 2026).

3. Algorithmic Procedure

RCS in CoBa Training Loop

  • For each step â„“ival(θ;t)\ell_i^{\mathrm{val}}(\theta;t)1 and each task â„“ival(θ;t)\ell_i^{\mathrm{val}}(\theta;t)2:
    • Update the fixed-length window of â„“ival(θ;t)\ell_i^{\mathrm{val}}(\theta;t)3 ratios.
    • Fit a least-squares line to compute slope â„“ival(θ;t)\ell_i^{\mathrm{val}}(\theta;t)4.
    • Normalize slopes and apply softmax as in the formal definition to yield taskwise weights.
    • Integrate with Absolute Convergence Score (ACS) and Divergence Factor (DF) to compose final weights:

    ℓival(θ;t)\ell_i^{\mathrm{val}}(\theta;t)5

  • A warm-up phase assigns uniform task weights for â„“ival(θ;t)\ell_i^{\mathrm{val}}(\theta;t)6.

  • See pseudocode in [(Gong et al., 2024), Algorithm 1].

Radial Consensus Score for Answer Selection

  • Sample â„“ival(θ;t)\ell_i^{\mathrm{val}}(\theta;t)7 answers â„“ival(θ;t)\ell_i^{\mathrm{val}}(\theta;t)8 from an LLM for input â„“ival(θ;t)\ell_i^{\mathrm{val}}(\theta;t)9.

  • Compute embeddings ii0.

  • Select weights ii1 by either uniform, frequency, or probability scheme:

    • Uniform: ii2.
    • Frequency: ii3, with ii4 the count of ii5.
    • Probability: ii6 proportional to ii7.
  • Compute semantic center ii8.
  • For each ii9, calculate tt0.
  • Select tt1 as the best candidate (Nguyen et al., 14 Apr 2026).
  • Total complexity is tt2, dominated by the embedding step.

4. Intuition, Interpretation, and Theoretical Rationale

In multitask supervision, RCS suppresses the training weight of rapidly converging ("fast") tasks and boosts weights for lagging ("slow") ones, enforcing uniform descent rates across validation losses and minimizing the risk of task "collapse" or overfitting. This balancing is made robust by normalization and softmax, rendering the method insensitive to scale and number of tasks (Gong et al., 2024).

In answer aggregation, Radial Consensus Score geometrically emphasizes semantic similarity: candidates close to the semantic center—corresponding to high-consensus answers regardless of surface-form plurality—are preferred. Unlike majority voting, which is brittle to lexical diversity, RCS naturally clusters near-duplicates and excludes geometric outliers, achieving robustness in the presence of noise and sampling artifacts (Nguyen et al., 14 Apr 2026).

5. Experimental Characterization and Ablative Insights

CoBa with RCS in Multi-task LLM Finetuning

  • On code completion benchmarks, the combined RCS+ACS+DF weighting in CoBa achieves a Pass@1 of 29.4% versus 28.7% and 28.1% with ablations (removing RCS or ACS, respectively). All three components are necessary for optimal performance.
  • RCS dynamically assigns higher scores to slowest-converging tasks prior to divergence, maintaining convergence equilibrium across tasks [(Gong et al., 2024), Table 8, Figure 1(c)].

Radial Consensus Score in Best-of-tt3 Selection

  • Across seven benchmarks, RCS (especially the uniform weighted variant) consistently outperforms majority voting and probability-only approaches, with gains increasing at higher tt4 (sampling budget).
  • On Qwen2.5-3B (N=10), RCStt5 achieves 57.0% selection accuracy versus 51.6% for self-consistency voting.
  • RCS methods continue to improve as tt6 scales, whereas majority voting can plateau or degrade.
  • Black-box applicability is demonstrated on API models, and in multi-agent debate settings, RCS yields quantifiable increases in consensus selection accuracy (Nguyen et al., 14 Apr 2026).

6. Limitations and Practical Considerations

For CoBa’s RCS:

  • Early-stage slope estimates are susceptible to noise, mitigated via a warm-up period and sufficiently large window sizes.
  • Linear regression on validation-loss ratios assumes quasi-linear convergence behavior; strong nonlinearity or stationary losses can degrade the diagnosticity of tt7.
  • Temperature tt8 (in DF), slope window tt9, and warm-up ℓˉival(θ;t)\bar\ell_i^{\mathrm{val}}(\theta;t)0 are hyperparameters—robust defaults exist but may require empirical tuning for new task distributions.
  • RCS does not resolve deeper task-level gradient conflicts or address multi-modality; it functions primarily as a pacing/heavy-tail suppression mechanism (Gong et al., 2024).

For Radial Consensus Score:

  • Embedding model fidelity can, in principle, impact semantic center accuracy; in practice, choice of embedding across widely used models (MiniLM, RoBERTa) yields stable results.
  • Using embeddings for entire reasoning trajectories rather than final answers can collapse semantic separation and degrade consensus ranking.
  • Black-box (API) applicability can be limited by the cost or feasibility of obtaining embeddings for large ℓˉival(θ;t)\bar\ell_i^{\mathrm{val}}(\theta;t)1; batched encoding is typically used to limit overhead (Nguyen et al., 14 Apr 2026).

7. Comparative Position and Broader Implications

Relative Convergence Score, as formulated in both CoBa and Radial Consensus Score frameworks, introduces robust alternatives to conventional heuristic or frequency-based balancing in both multitask training and answer aggregation pipelines.

Domain Metric Name Definition Basis Core Function
Multi-task Finetuning Relative Convergence Score (CoBa) Normalized validation-loss slopes + softmax Convergence pacing, dynamic task weight adjustment
Inference Aggregation Radial Consensus Score (Answer selection) Embedding distance from semantic center Geometric consensus, robust answer selection

In both settings, the shift from surface-form or scale-unaware methods to either convergence-normalized or semantically-aware metrics addresses known pathologies: task starvation and divergence in multitask LLMs (Gong et al., 2024); brittleness and majority failure in LLM inference aggregation (Nguyen et al., 14 Apr 2026). A plausible implication is that geometric and relative-normalization-based scoring may become standard components for balancing and aggregation methodologies in future LLM-oriented workflows.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Relative Convergence Scores (RCS).