Idealized CAT (ICAT) Score

Updated 10 April 2026

Idealized CAT (ICAT) Score is a metric that combines language model fluency and neutrality, using language model score (lms) and stereotype score (ss) to provide a single interpretable value.
It also functions in binary classification by computing precision at an empirically determined indistinguishability threshold, making it robust to issues like class imbalance.
ICAT overcomes limitations of traditional metrics such as AUC and F1 by offering a mathematically rigorous evaluation for both language bias and classifier performance.

The Idealized CAT (ICAT) Score is a metric designed to offer a principled, interpretable summary of model performance by addressing challenges in both LLM bias measurement and binary classification evaluation. The term "ICAT" represents two formally unrelated metrics from distinct lines of research: one for assessing LLM stereotyping via meaningful/irrelevant triplets, and another for classifier evaluation via indistinguishability precision at a principled threshold. Despite their separate origins, both share rigorous mathematical grounding and address limitations found in alternatives such as AUC or F1. This article delineates both definitions with complete technical fidelity.

1. ICAT for LLM Bias: Definition and Rationale

In evaluation of stereotypical bias in LLMs, the Idealized CAT Score (iCAT) is formulated to capture a tradeoff between language modeling competence and neutrality between stereotypical and anti-stereotypical completions (Pang et al., 2 Feb 2025). Given a test set $\mathcal D=\{(S_i, S^+_i, S^u_i)\}_{i=1}^N$ —where $S_i$ denotes a stereotypical, $S^+_i$ an anti-stereotypical, and $S^u_i$ an irrelevant sentence—the iCAT metric is built on two quantities:

LLM Score (lms):

$\mathrm{lms} = \frac{1}{N} \sum_{i=1}^N \mathbf{1}[L(S^u_i) < \max\{L(S_i), L(S^+_i)\}] \times 100$

$L(\cdot)$ is the model (pseudo-)log-likelihood.

Stereotype Score (ss):

$\mathrm{ss} = \frac{1}{N} \sum_{i=1}^N \mathbf{1}[L(S_i) > L(S^+_i)] \times 100$

The iCAT score itself is: $\mathrm{iCAT} = \mathrm{lms} \times \frac{\min(100-\mathrm{ss},\,\mathrm{ss})}{50}$ This yields a value in $[0, 100]$ , maximized only when the model always assigns highest probability to a meaningful (either stereotyped or anti-stereotyped) completion and shows perfect neutrality (ss = 50). The symmetry inherent in $\min(100-\mathrm{ss}, \mathrm{ss})$ ensures indifference between stereotyped and anti-stereotyped preferences is rewarded.

This design simultaneously penalizes LLMs that are either biased (ss far from 50) or lack discriminative power (low lms), producing a single-number summary that reflects both criteria.

2. ICAT in Classification: Precision at the Indistinguishability Threshold

For evaluating binary classifiers, the Idealized CAT Score (ICAT) is independently defined as precision at the "indistinguishability threshold" (Sumpter, 2023). For a classifier assigning real-valued scores to all instances, let $S_i$ 0 be the empirical distribution of true positive scores and $S_i$ 1 the distribution for true negatives.

Indistinguishability Threshold $S_i$ 2:

$S_i$ 3 is the unique solution to:

$S_i$ 4

Formally,

$S_i$ 5

with $S_i$ 6 and $S_i$ 7.

ICAT Score (Precision at $S_i$ 8):

$S_i$ 9

That is, the precision when the threshold is set such that positively-labeled items are statistically indistinguishable from true positives in pairwise comparisons.

3. Step-by-Step Computation Procedures

iCAT (LLM Bias)

For each triplet, compute likelihoods $S^+_i$ 0, $S^+_i$ 1, $S^+_i$ 2.
Compute lms: fraction where $S^+_i$ 3.
Compute ss: fraction where $S^+_i$ 4.
Compute iCAT as $S^+_i$ 5.

ICAT (Classification)

Sort unique classifier scores.
For each candidate threshold $S^+_i$ $S_{i}^{+}$ 6:
- Compute $S^+_i$ 7 as the positive-label survival fraction above $S^+_i$ 8.
- Compute $S^+_i$ 9 for negatives.
- Evaluate $S^u_i$ 0 as above.
Find $S^u_i$ 1 where $S^u_i$ 2 by interpolation.
Compute precision at $S^u_i$ 3 for the final ICAT score.

4. Interpretation and Numerical Behavior

For LLMs, iCAT values near 100 indicate both high fluency and neutrality (ss ≈ 50, lms ≈ 100). iCAT collapses to zero for models that are either always biased (ss near 0 or 100) or lack the ability to score meaningful completions above irrelevant ones (lms ≈ 0). Mid-range values reflect partial failures in either attribute.

For classification, ICAT tracks the fraction of predicted positives that are true positives at the threshold where predicted positives are "statistically indistinguishable" from true positives, as formalized via $S^u_i$ 4. Unlike AUC, ICAT is invariant to strict monotonic rescaling of scores and robust to class imbalance, since the balancing property absorbs the effect of "trivial negatives". In experimental settings with varying overlap between positive and negative distributions, ICAT reflects actual discriminative difficulty rather than being artificially inflated by class distribution (Sumpter, 2023).

Metric	Domain	Core Principle
CAT	Bias eval (CrowS-Pairs)	Biased preference rate, ignores irrelevance & fluency
iCAT	Bias eval (StereoSet, LIBRA)	Combines language ability (lms) and neutrality (ss)
EiCAT	Bias eval (LIBRA)	Incorporates JSD divergence and local-word knowledge penalty (bbs)
AUC	Classification	Probability positive ranked above negative, threshold independent
F1	Classification	Harmonic mean of precision/recall, ad hoc threshold
ICAT	Classification	Precision at indistinguishability threshold (B=0.5), robust to label imbalance

iCAT in LLM bias subsumes ss (CAT Score) and penalizes lack of fluency, while EiCAT (from LIBRA) further incorporates Jensen–Shannon divergence and a "beyond knowledge boundary score" (bbs) to address context in which unfamiliar terms impede meaningful bias measurement (Pang et al., 2 Feb 2025). In classification, ICAT avoids artifacts affecting AUC and F1 by anchoring threshold choice to empirical indistinguishability.

6. Illustrative Examples

LLM Bias Example

Given $S^u_i$ 5 triplets, suppose:

lms = 75 (i.e., model prefers a meaningful option 3/4 times)
ss = 50 (equal preference for stereotype and anti-stereotype)

Then: $S^u_i$ 6 If a model is maximally fluent but completely biased (ss=100): $S^u_i$ 7

In artificial datasets with varying overlap between normals $S^u_i$ 8:

"Easy" regime: ICAT ≈ 0.85
"Moderate": ICAT ≈ 0.69
"Hard": ICAT ≈ 0.50

These reflect intrinsic difficulty and remain stable under label-imbalance manipulations.

7. Strengths, Limitations, and Extensions

Strengths:

iCAT: Integrates fairness and language discrimination; symmetric; transparent computation; penalizes extreme preference or lack of competence (Pang et al., 2 Feb 2025).
ICAT: Robust to class imbalance and "easy negatives"; anchored threshold yields direct interpretability; immune to pitfalls of AUC/F1 (Sumpter, 2023).

Limitations:

iCAT: Reduces full distributional information to two statistics (ss, lms); may miss preference strength nuance; requires triplet format with a crafted irrelevant case; all test cases equally weighted, lacking stereotype severity adaptation.
ICAT: Focuses on the single indistinguishability point, does not capture full sensitivity/recall tradeoff.

Extensions:

EiCAT: Combines iCAT’s lms with JSD-based divergence and bbs to measure local context comprehension and bias distributionally.
For ICAT, the indistinguishability criterion may be replaced by $S^u_i$ 9 for other balances between positive and predicted classes, generalizing the notion of controlled tradeoff.

A plausible implication is that both ICAT formulations serve as templates for single-number metrics that are robust to frequent artifacts affecting more commonly used measures, provided construction and application align with their rigorous criteria.

Markdown Report Issue Upgrade to Chat

References (2)

LIBRA: Measuring Bias of Large Language Model from a Local Context (2025)

Precision at the indistinguishability threshold: a method for evaluating classification algorithms (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Idealized CAT (ICAT) Score.

Idealized CAT (ICAT) Score

1. ICAT for LLM Bias: Definition and Rationale

2. ICAT in Classification: Precision at the Indistinguishability Threshold

3. Step-by-Step Computation Procedures

iCAT (LLM Bias)

ICAT (Classification)

4. Interpretation and Numerical Behavior

6. Illustrative Examples

LLM Bias Example

Classification ICAT Example (from (Sumpter, 2023))

7. Strengths, Limitations, and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Idealized CAT (ICAT) Score

1. ICAT for LLM Bias: Definition and Rationale

2. ICAT in Classification: Precision at the Indistinguishability Threshold

3. Step-by-Step Computation Procedures

iCAT (LLM Bias)

ICAT (Classification)

4. Interpretation and Numerical Behavior

5. Comparison to Related Metrics

6. Illustrative Examples

LLM Bias Example

Classification ICAT Example (from (Sumpter, 2023))

7. Strengths, Limitations, and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research