Papers
Topics
Authors
Recent
Search
2000 character limit reached

Idealized CAT (ICAT) Score

Updated 10 April 2026
  • Idealized CAT (ICAT) Score is a metric that combines language model fluency and neutrality, using language model score (lms) and stereotype score (ss) to provide a single interpretable value.
  • It also functions in binary classification by computing precision at an empirically determined indistinguishability threshold, making it robust to issues like class imbalance.
  • ICAT overcomes limitations of traditional metrics such as AUC and F1 by offering a mathematically rigorous evaluation for both language bias and classifier performance.

The Idealized CAT (ICAT) Score is a metric designed to offer a principled, interpretable summary of model performance by addressing challenges in both LLM bias measurement and binary classification evaluation. The term "ICAT" represents two formally unrelated metrics from distinct lines of research: one for assessing LLM stereotyping via meaningful/irrelevant triplets, and another for classifier evaluation via indistinguishability precision at a principled threshold. Despite their separate origins, both share rigorous mathematical grounding and address limitations found in alternatives such as AUC or F1. This article delineates both definitions with complete technical fidelity.

1. ICAT for LLM Bias: Definition and Rationale

In evaluation of stereotypical bias in LLMs, the Idealized CAT Score (iCAT) is formulated to capture a tradeoff between language modeling competence and neutrality between stereotypical and anti-stereotypical completions (Pang et al., 2 Feb 2025). Given a test set D={(Si,Si+,Siu)}i=1N\mathcal D=\{(S_i, S^+_i, S^u_i)\}_{i=1}^N—where SiS_i denotes a stereotypical, Si+S^+_i an anti-stereotypical, and SiuS^u_i an irrelevant sentence—the iCAT metric is built on two quantities:

  • LLM Score (lms):

lms=1Ni=1N1[L(Siu)<max{L(Si),L(Si+)}]×100\mathrm{lms} = \frac{1}{N} \sum_{i=1}^N \mathbf{1}[L(S^u_i) < \max\{L(S_i), L(S^+_i)\}] \times 100

L()L(\cdot) is the model (pseudo-)log-likelihood.

ss=1Ni=1N1[L(Si)>L(Si+)]×100\mathrm{ss} = \frac{1}{N} \sum_{i=1}^N \mathbf{1}[L(S_i) > L(S^+_i)] \times 100

The iCAT score itself is: iCAT=lms×min(100ss,ss)50\mathrm{iCAT} = \mathrm{lms} \times \frac{\min(100-\mathrm{ss},\,\mathrm{ss})}{50} This yields a value in [0,100][0, 100], maximized only when the model always assigns highest probability to a meaningful (either stereotyped or anti-stereotyped) completion and shows perfect neutrality (ss = 50). The symmetry inherent in min(100ss,ss)\min(100-\mathrm{ss}, \mathrm{ss}) ensures indifference between stereotyped and anti-stereotyped preferences is rewarded.

This design simultaneously penalizes LLMs that are either biased (ss far from 50) or lack discriminative power (low lms), producing a single-number summary that reflects both criteria.

2. ICAT in Classification: Precision at the Indistinguishability Threshold

For evaluating binary classifiers, the Idealized CAT Score (ICAT) is independently defined as precision at the "indistinguishability threshold" (Sumpter, 2023). For a classifier assigning real-valued scores to all instances, let SiS_i0 be the empirical distribution of true positive scores and SiS_i1 the distribution for true negatives.

  • Indistinguishability Threshold SiS_i2:

SiS_i3 is the unique solution to:

SiS_i4

Formally,

SiS_i5

with SiS_i6 and SiS_i7.

  • ICAT Score (Precision at SiS_i8):

SiS_i9

That is, the precision when the threshold is set such that positively-labeled items are statistically indistinguishable from true positives in pairwise comparisons.

3. Step-by-Step Computation Procedures

iCAT (LLM Bias)

  1. For each triplet, compute likelihoods Si+S^+_i0, Si+S^+_i1, Si+S^+_i2.
  2. Compute lms: fraction where Si+S^+_i3.
  3. Compute ss: fraction where Si+S^+_i4.
  4. Compute iCAT as Si+S^+_i5.

ICAT (Classification)

  1. Sort unique classifier scores.
  2. For each candidate threshold Si+S^+_i6:
    • Compute Si+S^+_i7 as the positive-label survival fraction above Si+S^+_i8.
    • Compute Si+S^+_i9 for negatives.
    • Evaluate SiuS^u_i0 as above.
  3. Find SiuS^u_i1 where SiuS^u_i2 by interpolation.
  4. Compute precision at SiuS^u_i3 for the final ICAT score.

4. Interpretation and Numerical Behavior

For LLMs, iCAT values near 100 indicate both high fluency and neutrality (ss ≈ 50, lms ≈ 100). iCAT collapses to zero for models that are either always biased (ss near 0 or 100) or lack the ability to score meaningful completions above irrelevant ones (lms ≈ 0). Mid-range values reflect partial failures in either attribute.

For classification, ICAT tracks the fraction of predicted positives that are true positives at the threshold where predicted positives are "statistically indistinguishable" from true positives, as formalized via SiuS^u_i4. Unlike AUC, ICAT is invariant to strict monotonic rescaling of scores and robust to class imbalance, since the balancing property absorbs the effect of "trivial negatives". In experimental settings with varying overlap between positive and negative distributions, ICAT reflects actual discriminative difficulty rather than being artificially inflated by class distribution (Sumpter, 2023).

Metric Domain Core Principle
CAT Bias eval (CrowS-Pairs) Biased preference rate, ignores irrelevance & fluency
iCAT Bias eval (StereoSet, LIBRA) Combines language ability (lms) and neutrality (ss)
EiCAT Bias eval (LIBRA) Incorporates JSD divergence and local-word knowledge penalty (bbs)
AUC Classification Probability positive ranked above negative, threshold independent
F1 Classification Harmonic mean of precision/recall, ad hoc threshold
ICAT Classification Precision at indistinguishability threshold (B=0.5), robust to label imbalance

iCAT in LLM bias subsumes ss (CAT Score) and penalizes lack of fluency, while EiCAT (from LIBRA) further incorporates Jensen–Shannon divergence and a "beyond knowledge boundary score" (bbs) to address context in which unfamiliar terms impede meaningful bias measurement (Pang et al., 2 Feb 2025). In classification, ICAT avoids artifacts affecting AUC and F1 by anchoring threshold choice to empirical indistinguishability.

6. Illustrative Examples

LLM Bias Example

Given SiuS^u_i5 triplets, suppose:

  • lms = 75 (i.e., model prefers a meaningful option 3/4 times)
  • ss = 50 (equal preference for stereotype and anti-stereotype)

Then: SiuS^u_i6 If a model is maximally fluent but completely biased (ss=100): SiuS^u_i7

In artificial datasets with varying overlap between normals SiuS^u_i8:

  • "Easy" regime: ICAT ≈ 0.85
  • "Moderate": ICAT ≈ 0.69
  • "Hard": ICAT ≈ 0.50

These reflect intrinsic difficulty and remain stable under label-imbalance manipulations.

7. Strengths, Limitations, and Extensions

Strengths:

  • iCAT: Integrates fairness and language discrimination; symmetric; transparent computation; penalizes extreme preference or lack of competence (Pang et al., 2 Feb 2025).
  • ICAT: Robust to class imbalance and "easy negatives"; anchored threshold yields direct interpretability; immune to pitfalls of AUC/F1 (Sumpter, 2023).

Limitations:

  • iCAT: Reduces full distributional information to two statistics (ss, lms); may miss preference strength nuance; requires triplet format with a crafted irrelevant case; all test cases equally weighted, lacking stereotype severity adaptation.
  • ICAT: Focuses on the single indistinguishability point, does not capture full sensitivity/recall tradeoff.

Extensions:

  • EiCAT: Combines iCAT’s lms with JSD-based divergence and bbs to measure local context comprehension and bias distributionally.
  • For ICAT, the indistinguishability criterion may be replaced by SiuS^u_i9 for other balances between positive and predicted classes, generalizing the notion of controlled tradeoff.

A plausible implication is that both ICAT formulations serve as templates for single-number metrics that are robust to frequent artifacts affecting more commonly used measures, provided construction and application align with their rigorous criteria.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Idealized CAT (ICAT) Score.