Papers
Topics
Authors
Recent
Search
2000 character limit reached

Conditional Bias Scan (CBS)

Updated 9 April 2026
  • Conditional Bias Scan (CBS) is a flexible auditing framework that reveals intersectional and contextual biases by identifying subgroups with statistically significant deviations.
  • It recasts bias detection as a conditional independence test, leveraging efficient subset scanning and robust statistical calibration to quantify disparities.
  • Empirical case studies, such as COMPAS analyses, demonstrate CBS's superior capability in uncovering hidden false positive and calibration disparities compared to traditional fairness metrics.

Conditional Bias Scan (CBS) is a flexible auditing framework for detecting intersectional and contextual biases in classification models. Designed to reveal subgroup disparities not detectable by standard group fairness metrics, CBS systematically searches for subgroups within a protected class whose predicted outcomes—or realized errors on those predictions—differ significantly from their counterparts in the non-protected class, according to a broad range of fairness criteria. The method integrates efficient subset scanning, robust statistical calibration, and the capacity to audit both probabilistic and binarized model outcomes (Boxer et al., 2023).

1. Formalization of Intersectional Bias

Standard group-fairness metrics (e.g., equalized odds, calibration) compare rates between protected (A=1A=1) and non-protected (A=0A=0) groups in aggregate. However, a classifier may mask disparities that manifest only in subgroups defined by intersections of covariates, such as race \wedge gender or age \wedge criminal history.

Let i=1,,ni = 1, \ldots, n index individuals with the following observed:

  • Sensitive attribute Ai{0,1}A_i \in \{0,1\} (Ai=1A_i=1 denotes membership in the protected class)
  • Covariates XiX_i
  • Binary outcome Yi{0,1}Y_i \in \{0,1\}
  • Probabilistic prediction Pi[0,1]P_i \in [0,1]
  • Binary recommendation A=0A=00

An intersectional bias exists if there exists a subgroup A=0A=01, defined by a conjunction of covariate values, such that its fairness metric (e.g., false positive rate) differs from the corresponding subgroup A=0A=02 in the non-protected class—even when group-level parity holds:

A=0A=03

CBS operationalizes the discovery of A=0A=04 and quantifies the statistical significance of the observed deviation.

2. Mathematical Formulation and Scan Statistics

CBS recasts the audit for subgroup bias as a conditional independence test. For each individual, CBS defines an event variable A=0A=05 (e.g., prediction, recommendation, or outcome), and tests the null hypothesis

A=0A=06

where A=0A=07 is a conditioning variable (e.g., A=0A=08 or A=0A=09), as defined by the specific fairness criterion.

For each \wedge0 with \wedge1, CBS estimates an expected “counterfactual” value:

\wedge2

A scan statistic \wedge3 aggregates the evidence for bias within each candidate subgroup \wedge4, with the log-likelihood ratio (LLR) distinguishing the observed \wedge5's in \wedge6 from their expected \wedge7's.

Table: CBS Scan Forms

Scan Type \wedge8, \wedge9 \wedge0 Formulation (for subgroup \wedge1)
Separation, binaries \wedge2 \wedge3
Separation, continuous \wedge4 \wedge5, with \wedge6
Sufficiency, binaries \wedge7 Bernoulli form above with \wedge8
Sufficiency, binaries \wedge9 As above, conditioned on i=1,,ni = 1, \ldots, n0

The scan maximizes i=1,,ni = 1, \ldots, n1 over all possible i=1,,ni = 1, \ldots, n2 to identify the most substantively biased subgroup i=1,,ni = 1, \ldots, n3.

3. Fairness Definitions Audited by CBS

CBS audits any group-fairness definition reducible to a conditional-independence test:

i=1,,ni = 1, \ldots, n4

This unifies a range of commonly used criteria, spanning both “separation” (error-rate parity) and “sufficiency” (predictive value parity), and can be instantiated for both probabilistic and thresholded outcomes:

  • Separation-based fairness (condition on ground-truth i=1,,ni = 1, \ldots, n5):
    • Balance for the positive/negative class: i=1,,ni = 1, \ldots, n6, i=1,,ni = 1, \ldots, n7
    • True/false positive and negative rate parity (on i=1,,ni = 1, \ldots, n8)
  • Sufficiency-based fairness (condition on model output i=1,,ni = 1, \ldots, n9 or Ai{0,1}A_i \in \{0,1\}0):
    • Calibration/predictive parity: Ai{0,1}A_i \in \{0,1\}1
    • Positive/negative predictive value parity

CBS can also conduct value-conditional scans, e.g., restricting to Ai{0,1}A_i \in \{0,1\}2 to detect false positive rate disparities.

4. Algorithmic Implementation

The CBS algorithm integrates statistical estimation, subset scanning, and permutation inference.

  1. Expected Value Estimation: Train regression or logistic regression models (using Ai{0,1}A_i \in \{0,1\}3 data), with inverse-propensity weights Ai{0,1}A_i \in \{0,1\}4, to estimate Ai{0,1}A_i \in \{0,1\}5 for Ai{0,1}A_i \in \{0,1\}6.
  2. Scan Statistic Construction: Compute Ai{0,1}A_i \in \{0,1\}7 as specified for the chosen fairness definition and scan type.
  3. Iterative Subset Scan: For Ai{0,1}A_i \in \{0,1\}8 random restarts:
    • Initialize candidate subgroup Ai{0,1}A_i \in \{0,1\}9.
    • For each unscanned attribute Ai=1A_i=10, relax Ai=1A_i=11 on Ai=1A_i=12, evaluating Ai=1A_i=13 for possible value sets and constructing intervals over thresholds.
    • Select the interval/subgroup with maximal Ai=1A_i=14 as the current Ai=1A_i=15.
    • Iterate until convergence, recording the maximal Ai=1A_i=16 and corresponding Ai=1A_i=17.
  4. Statistical Significance Assessment: Evaluate the significance of Ai=1A_i=18 via permutation testing (shuffling Ai=1A_i=19).

This approach exploits the Additive Linear-Time Subset Scanning (ALTSS) property to achieve near-linear time subset evaluation at each step (Boxer et al., 2023).

5. Theoretical Foundations

CBS’s scan statistics satisfy the ALTSS property, ensuring each attribute scan requires evaluating XiX_i0 subsets rather than XiX_i1, where XiX_i2 is attribute arity. Multiple random restarts and coordinate ascent enable effective search for the global optimum in practice. The use of permutation-based p-values corrects for multiple testing across subgroups, maintaining control of the family-wise error rate while retaining high detection power. Empirical evaluation demonstrates that CBS achieves higher detection accuracy (e.g., Jaccard index for true subgroup recovery) than GerryFair and Multiaccuracy Boost, particularly for small or subtle subgroups.

6. Empirical Results: COMPAS Case Studies

CBS was evaluated using semi-synthetic and real-world analyses of the COMPAS pre-trial risk assessment tool:

  • Simulated Bias Experiments: With injected biases over COMPAS covariates,
    • Separation scans were most sensitive to artificial shifts in predicted log-odds.
    • Sufficiency scans best detected shifts in true log-odds.
    • CBS outperformed competing auditors in accuracy across varied subgroup sizes and bias magnitudes.
  • Real Data Analysis (COMPAS, XiX_i3):
    • Significant false positive rate disparity detected for Black males: XiX_i4 for Black & male, XiX_i5 for non-Black & male (XiX_i6, XiX_i7).
    • Separation for predictions flagged the “under-25 felony” subgroup with substantial disparity in average predicted risk (XiX_i8 vs. XiX_i9).
    • Sufficiency analysis revealed calibration issues among older males with Yi{0,1}Y_i \in \{0,1\}0–Yi{0,1}Y_i \in \{0,1\}1 prior offenses: Yi{0,1}Y_i \in \{0,1\}2 vs. Yi{0,1}Y_i \in \{0,1\}3 (Yi{0,1}Y_i \in \{0,1\}4, Yi{0,1}Y_i \in \{0,1\}5).

7. Practical Considerations and Limitations

CBS’s reliability depends on accurate estimation of expected outcomes (Yi{0,1}Y_i \in \{0,1\}6), which in turn is sensitive to the specification of the propensity-score and outcome models. Doubly robust or targeted learning approaches may offer improved robustness. CBS addresses only group-level conditional-independence fairness, not individual fairness or counterfactual analyses. Detection power is reduced for very small subgroups or highly noisy data, especially in sufficiency scans with weak covariate signal. CBS identifies the single most significant subgroup per run; iterative re-scanning is required for disjoint subgroup discovery. Permutation-based significance computation is computationally intensive, though approximate null-distributions may offer future acceleration.

CBS provides a statistically principled and computationally efficient approach for discovering intersectional and contextual model biases under a broad range of group-fairness definitions, with demonstrated effectiveness both in synthetic settings and real-world deployments (Boxer et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Conditional Bias Scan (CBS).