Counterfactual & Fairness Explanations

Updated 23 March 2026

Counterfactual and fairness-based explanations are frameworks that use minimal feature changes to identify actionable recourse and diagnose unfairness in machine learning models.
They employ diverse computational methodologies—including SAT-based methods, genetic algorithms, and reinforcement learning—to generate plausible and constrained counterfactuals.
These approaches integrate individual, group, and procedural fairness criteria to audit and mitigate algorithmic discrimination while ensuring actionable and equitable outcomes.

Counterfactual and fairness-based explanations are formal frameworks and computational methodologies that leverage "what-if" reasoning to elucidate model decisions and audit for unfairness in machine learning systems. Counterfactual explanations identify minimal changes to features that would alter a model's decision, thereby highlighting actionable recourse and diagnosing both instance- and group-level disparities. Fairness-based explanations extend these mechanisms to quantify and correct the treatment of individuals and groups, enforcing consistency, transparency, and non-discriminatory recourse. This article details the definitions, algorithmic advances, fairness criteria, and auditing methodologies, as well as procedural and group-level innovations shaping this field.

1. Formal Definitions and Theoretical Foundations

The core object of counterfactual explanation is a model $f:\mathcal{X}\subseteq\mathbb{R}^d \to \{0,1\}$ (or a general prediction function), an input $\mathbf{x}$ , with $f(\mathbf{x}) = y$ , and a target outcome $y'$ . A counterfactual $\mathbf{x}'$ is sought such that $f(\mathbf{x}') = y'$ and the transformation from $\mathbf{x}$ to $\mathbf{x}'$ is minimal with respect to a chosen cost or distance metric (e.g., $\ell_1$ , $\ell_2$ , Gower, or custom user-defined penalties) (Wang et al., 2023, Artelt et al., 2022, Artelt et al., 2021).

Key definitions include:

Closest Counterfactual: $\mathbf{x}' = \arg\min_{z} d(\mathbf{x},z) \text{ s.t. } f(z)=y'$ ; the minimal change needed for a different prediction (Asher et al., 2020).
Fair Counterfactual: In addition to minimality, the explanation is constrained by fairness desiderata, e.g., same effort across protected groups, preserving immutable features (Artelt et al., 2022, Wang et al., 2023).
Group Counterfactuals: Sets of counterfactuals that collectively cover a group, ensuring that patterns of recourse and burden are evaluated and explained at a collective level (Fragkathoulas et al., 2024).

Causal and optimal transport theory further distinguish between interventional counterfactuals (altering structural assignments in Pearlian SCMs) and backtracking counterfactuals (altering exogenous variables while holding protected attributes fixed), with the latter providing new paradigms for fairness diagnostics that sidestep incoherent demographic interventions (Bynum et al., 2024, Lara et al., 2021).

2. Algorithmic Methodologies for Generation and Auditing

Diverse algorithmic approaches have been advanced:

SAT-based Enumeration: The CEMSP framework formulates counterfactual generation as a minimal-satisfiable-subset Boolean satisfiability problem over abnormal features (those outside their normal range), enforcing that only those are changed and leveraging SAT solvers' efficiency and composability for actionability, causality, and domain constraints (Wang et al., 2023).
Black-Box Optimization and Genetic Algorithms: Model-agnostic methods like CERTIFAI use evolutionary algorithms to search for diverse, nearest counterfactuals, accommodating categorical, continuous, or image data and permitting the easy specification of fairness constraints such as muting immutable features (Sharma et al., 2019).
Latent-Space and Manifold-Based Search: Latent-CF leverages an autoencoder latent space to perform gradient-based counterfactual search, producing explanations that are more plausible and “in-distribution” than those based solely on raw feature space (Balasubramanian et al., 2020).
Reinforcement Learning: Hybrid and multi-objective RL formulations allow direct optimization for both individual- and group-fairness criteria, optimizing reward signals that combine proximity, plausibility, equal effectiveness, and balanced recourse choice (Ezzeddine et al., 28 Jan 2026, Wang et al., 2023).
Graph-Based Group Counterfactual Explanations: FGCE constructs group-level explanations using a data-manifold graph encoding feasibility constraints, cost, and coverage, and employs submodular/greedy or MIP optimization for coverage and burden auditing (Fragkathoulas et al., 2024).
Procedural Fairness via Integrated Gradients: GCIG regularizes trained models so that their feature attributions (via Integrated Gradients) remain invariant across protected groups conditioned on true labels, operationalizing procedural fairness as explanation stability (Popoola et al., 11 Mar 2026).

3. Fairness Criteria and Constraints in Counterfactual Explanations

A rigorous spectrum of fairness properties has emerged:

Individual Fairness: Near-identical individuals receive near-identical recourse or explanations. This is instantiated via stability/robustness constraints (low sensitivity of counterfactuals to small input perturbations) and direct penalties on counterfactual similarity across similar instances (Artelt et al., 2021, Ezzeddine et al., 28 Jan 2026, Wang et al., 2023).
Group Fairness: The distribution or complexity of recourse does not systematically differ across protected groups. This is mathematically specified as bounding the mean or distributional difference (cost, number of changes) of CFs between groups (Artelt et al., 2022, Cornacchia et al., 2023).
Hybrid Fairness: Simultaneous enforcement of both individual- and group-level criteria, as formalized in RL-based counterfactual generation via joint constraints on proximity among similar individuals and recourse effectiveness/choice parity across groups (Ezzeddine et al., 28 Jan 2026).
Procedural Fairness: Invariance of the explanation process itself (feature-attribution profiles) across groups, extending beyond parity in outcomes (Popoola et al., 11 Mar 2026).
Counterfactual Fair Opportunity: Individual-level metric measuring whether the improvement from negative to positive outcome via CF requires changing a proxy for the sensitive attribute, revealing proxy discrimination not visible to standard group metrics (Cornacchia et al., 2023).

4. Key Auditing Metrics and Trade-Offs

Auditing model fairness and recourse via counterfactuals now incorporates a suite of metrics:

Metric	What it Measures	Key References
Inconsistency/Hausdorff	Stability of CFs under input/model perturb.	(Wang et al., 2023, Artelt et al., 2021)
Sparsity	Fraction of untouched features in CFs	(Wang et al., 2023, Balasubramanian et al., 2020)
CFlips/nDCCF	Rate/discounted rank of CFs requiring change in sensitive proxy	(Cornacchia et al., 2023)
Burden	Group-average cost of nearest CFs	(Sharma et al., 2019, Fragkathoulas et al., 2024)
Equal Effectiveness	Fraction of group for whom at least one CF is effective	(Ezzeddine et al., 28 Jan 2026)
Equal Choice of Recourse	Number of distinct actionable CFs per group	(Ezzeddine et al., 28 Jan 2026)
Procedural Fairness (GCIG)	$\ell_2$ -distance between group-conditional IG attribution vectors	(Popoola et al., 11 Mar 2026)
Coverage/cost/kAUC/dAUC	Group-level trade-off curves for recourse coverage vs. cost vs. #CFs	(Fragkathoulas et al., 2024)

Empirical studies robustly support that incorporating plausibility constraints, domain-inspired feature bounds, or group-fairness penalties increases stability (lower inconsistency), reduces disparity, and sometimes achieves fairness gains without large sacrifices in proximity or utility (Wang et al., 2023, Artelt et al., 2022, Fragkathoulas et al., 2024, Ezzeddine et al., 28 Jan 2026).

5. Causality, Actionability, and Extensions

Modern frameworks integrate actionability (immutability constraints), causality (structural or path-specific constraints), and realistic feasibility into counterfactual search:

SAT/CNF Augmentation: CEMSP encodes actionability (freezing immutable indices), monotonicity, and causal dependencies directly as additional CNF clauses, permitting composable, scalable enforcement during Boolean search (Wang et al., 2023).
Transport-based and Path-Dependent Fairness: Optimal transport plans, under minimal regularity assumptions, can act as surrogates for Pearl-style causal counterfactuals, enabling fairness regularization even in the absence of full SCM identification (Lara et al., 2021). Path-dependent LCF and counterfactual fairness address fairness along specific, user-defined causal pathways (Zuo et al., 2024).
Backtracking Counterfactuals: Alternate initial conditions on exogenous noise, while fixing protected attributes, sidestep conceptual and technical challenges of demographic interventions, ensuring auditability and recourse recommendations are meaningful when social categories are not amenable to modular manipulations (Bynum et al., 2024).

6. Applications and Empirical Insights

Counterfactual and fairness-based explanations are fundamental to:

Instance-level Recourse and Transparency: Empowering individuals with concrete actions for changing outcomes, compliant with GDPR and fair lending laws (Balasubramanian et al., 2020, Wang et al., 2023).
Fairness Auditing and Compliance: Providing both individual and group-level diagnosis of disparate impact and discriminatory recourse requirements, including in black-box recommendation systems, tabular/classification tasks, and neural network pipelines (Artelt et al., 2022, Cornacchia et al., 2023, Kim et al., 11 Feb 2026, Ge et al., 2022).
Policy and Organizational Trust: Procedural fairness regularization assures that explanations remain stable and intelligible across groups, addressing emerging requirements in organizational psychology and regulatory contexts (Popoola et al., 11 Mar 2026).
Practical Mitigation: Empirical analyses show that group-level (burden, coverage) and path-based recourse disparities can be mitigated via adversarial retraining, cost-weighted loss, group-fair or hybrid explanations, often with only minor utility cost (Fragkathoulas et al., 2024, Ezzeddine et al., 28 Jan 2026, Zuo et al., 2024).

7. Limitations and Open Challenges

Current research faces several persistent challenges:

Scalability and Computation: Group counterfactual enumeration, MIP optimization, and large-scale RL-based generation can be computationally demanding, especially under rich feasibility and domain constraints (Fragkathoulas et al., 2024, Ezzeddine et al., 28 Jan 2026).
Specification of Normative Choices: Designation of mutable/immutable variables, definition of similarity/distance, and selection of opportunity sets are often domain-dependent and normatively charged (Bynum et al., 2024, Lara et al., 2021).
Causal Model Identification: Faithful causal and path-specific fairness constraints presuppose knowledge of SCMs, which are rarely fully specified in practice. Transport-based surrogates and backtracking paradigms are promising, but general theory on their quantitative equivalence or superiority is nascent (Lara et al., 2021, Bynum et al., 2024, Zuo et al., 2024).
Interplay of Outcome and Procedural Fairness: Empirical evidence demonstrates that equalized odds/output parity can be achieved while explanation disparity remains high, indicating the need for joint procedural and outcome-oriented objectives (Popoola et al., 11 Mar 2026).
Auditability Under Unawareness: Under fairness under unawareness, standard group metrics may miss proxy discrimination; counterfactual flipping metrics (CFlips, nDCCF) are necessary to expose such bias (Cornacchia et al., 2023).

Theoretical and empirical advances in counterfactual and fairness-based explanations have fundamentally reshaped the landscape of interpretable and trustworthy machine learning. By unifying minimal change, domain-actionability, individual and group parity, and robust, plausible explanation, these frameworks enable precise, auditable diagnosis and mitigation of algorithmic discrimination (Wang et al., 2023, Artelt et al., 2022, Fragkathoulas et al., 2024, Popoola et al., 11 Mar 2026, Bynum et al., 2024).