Role Divergence Index (RDI) Analysis

Updated 27 May 2026

RDI is an ordinal metric that quantifies deviations from assigned epistemic roles, measuring severity from mild role abandonment to full polarity reversal.
It computes severity by normalizing misclassified outputs in a multi-agent LLM context using role labels (CRITICAL, BALANCED, CHARITABLE).
RDI aids in diagnosing model failure modes by distinguishing between minor errors and critical role inversion, thereby guiding system tuning.

The Role Divergence Index (RDI) is an ordinal metric devised to quantify the extent and severity of deviations from assigned epistemic roles in multi-agent pipelines, particularly in the context of LLM–mediated assessment of political statements. Initially introduced in systematic analyses of epistemic role fidelity in LLM-based democratic discourse systems (Dietrich, 29 Apr 2026), RDI is independently motivated from the aggregation of role-diversity measures in multi-agent reinforcement learning (MARL) (Hu et al., 2022). RDI serves as a compact indicator for diagnosing the severity of observed drift—specifically, whether role misclassification events reflect mild abandonment or full polarity reversal.

1. Formal Definition and Interpretation

In the LLM-based discourse analysis context, RDI evaluates the misclassification severity of a charitable-advocate agent. Given $N$ outputs generated under the charitable prompt, each output is classified by an epistemic stance classifier as one of three roles: CRITICAL ( $-1$ ), BALANCED ($0$), or CHARITABLE ( $+1$ ). Only outputs whose true role is CHARITABLE and which are misclassified (as either BALANCED or CRITICAL) are considered. Define:

$n_{\text{crit}}$ : Number of charitable runs misclassified as CRITICAL
$n_{\text{bal}}$ : Number of charitable runs misclassified as BALANCED
$n_{\text{wrong}} = n_{\text{crit}} + n_{\text{bal}}$

The RDI is computed as:

$\mathrm{RDI} = \frac{2\,n_{\text{crit}} + n_{\text{bal}}}{2\,n_{\text{wrong}}}$

This normalization ensures that if every error is a two-step drift to CRITICAL, $\mathrm{RDI} = 1.0$ ; if all are one-step drift to BALANCED, $\mathrm{RDI} = 0.5$ . Resulting values are restricted to $-1$ 0.

Interpretively, RDI quantifies the average ordinal severity of misclassifications relative to the maximally severe error, making it directly interpretable as the proportion of errors that correspond to full polarity reversal. High RDI ( $-1$ 1) indicates role inversion (CHARITABLE $-1$ 2CRITICAL); low RDI ( $-1$ 3) indicates mere abandonment (CHARITABLE $-1$ 4BALANCED).

2. Computation Methodology in LLM Pipelines

Data Acquisition

For each test statement, the charitable-advocate model is run multiple times (e.g., five), generating $-1$ 5 reasoning texts per configuration (model, language, fact-check provider).

Role Classification

A role classifier (e.g., Mistral Large) assesses each reasoning text, labeling its epistemic stance as CRITICAL, BALANCED, or CHARITABLE.

Error Tally and RDI Calculation

Focusing on cases where the ground-truth is CHARITABLE, misclassifications are tallied into $-1$ 6 and $-1$ 7, and the RDI formula is applied. If $-1$ 8, RDI is undefined (perfect fidelity; "no drift" is reported).

3. Empirical Values and Comparative Table

Empirical measurements demonstrate substantial differences in RDI across models, languages, and fact-check configurations. The following table summarizes representative statistics from (Dietrich, 29 Apr 2026):

Model	Language	FC	RDI	Acc (charitable)	EDD	DDI	ERS
Claude	EN	Gemini	0.810	39%	0.993	–0.993	1.075
Claude	DE	Gemini	0.679	48%	0.707	–0.707	1.032
Mistral	EN	Gemini	0.656	67%	0.438	–0.438	0.844
Mistral	DE	Gemini	0.615	68%	0.393	–0.393	0.799

Claude’s English RDI of 0.810 indicates that 81% of its errors are full polarity reversals; Mistral’s German RDI of 0.615 reflects a milder error distribution.

4. Relationship to Other Drift and Diversity Metrics

RDI is one of four principal drift metrics in (Dietrich, 29 Apr 2026), alongside:

Expected Drift Distance (EDD): Mean absolute ordinal distance between predicted and true roles; symmetric, ranges $-1$ 9.
Directional Drift Index (DDI): Mean signed ordinal difference; negative values reflect drift toward CRITICAL.
Entropy-based Role Stability (ERS): Shannon entropy of predicted role distribution; measures consistency, with maximum $0$0.

Key properties distinguishing RDI:

Role-specific focus on CHARITABLE misclassification
Ordinal severity encoding: two-step (CRITICAL) errors count double
Normalization relative to the error set, enabling interpretable severity ratio

In cooperative MARL, (Hu et al., 2022) defines three separate role-diversity scores: action-based ($0$1), trajectory-based ($0$2), and contribution-based ($0$3). While a scalar aggregation,

$0$4

is mathematically plausible, no unified meta-index is used in practice. Each component reflects distinct aspects of agent divergence relevant for policy design.

5. Diagnostic Utility and Failure Mode Analysis

RDI enables precise identification of failure modes in multi-agent LLM pipelines:

Role abandonment: Predominant misclassification as BALANCED (RDI $0$5)
Polarity reversal: Predominant misclassification as CRITICAL (RDI $0$6)

In (Dietrich, 29 Apr 2026), two mechanisms—Epistemic Floor Effect (EFE) and Role-Prior Conflict (RPC)—are captured by RDI trends. For instance, when the fact-check strictly contradicts the assigned CHARITABLE stance (EFE), models such as Claude yield RDI approaching $0$7, never outputting CHARITABLE and misclassifying exclusively as CRITICAL.

Operationally, RDI supports thresholding for system validation: configurations with RDI $0$8 are flagged as “unacceptably prone to polarity reversal” and may warrant re-tuning or exclusion.

6. RDI in Theoretical and Practical Contexts

In MARL, role divergence is empirically and theoretically linked to policy performance and credit assignment (Hu et al., 2022). Action-, trajectory-, and contribution-based scores inform architecture choices (parameter sharing, communication mechanism selection, credit assignment method).

While RDI in the LLM setting measures epistemic fidelity, in MARL, the diversity scores underpin theoretical error bounds. For example, Theorem 1 in (Hu et al., 2022) connects the contribution-diversity term directly to excess joint Q-value error.

7. Significance and Applications

RDI provides a compact, interpretable measure of how severely an agent deviates from its designated role. In political statement analysis with LLMs, it diagnoses model-specific failure modes, supports systematic protocol validation, and anchors thresholds for practical deployment risk. In multi-agent systems more broadly, role-divergence metrics guide both empirical policy selection and theoretical guarantees of system robustness.

By distinguishing “mild abandonment” from “active inversion,” RDI fills a core gap in epistemic validation, complementing holistic drift and stability metrics. Its role-specific, ordinal, and normalized construction ensures that both practitioners and theorists have access to actionable, fine-grained diagnostics for role fidelity and diversity across domains (Dietrich, 29 Apr 2026, Hu et al., 2022).

Markdown Report Issue Upgrade to Chat

References (2)

When Roles Fail: Epistemic Constraints on Advocate Role Fidelity in LLM-Based Political Statement Analysis (2026)

Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Role Divergence Index (RDI).