Probability of Sufficiency in Concept Interventions

Updated 9 December 2025

Probability of sufficiency is defined within structural causal models to measure the likelihood that altering a concept alone triggers a change in model output.
PS is computed via counterfactual simulation or Monte Carlo methods, applied for both local instances and global population-level explanations.
This metric filters spurious associations and supports actionable recourse by providing causal, high-fidelity explanations in concept-based XAI.

Probability of sufficiency (PS) of concept interventions is a formal, post-hoc causal metric central to contemporary concept-based interpretability and explainability frameworks in machine learning. Defined within the structural causal model (SCM) paradigm, PS quantifies, for a given concept intervention, the likelihood that fixing a concept variable to a different value alone would suffice to force a change in the model’s output. This metric is foundational for assigning causally-valid, human-interpretable attributions to high-level concepts in black-box models, bridging the gap between algorithmic opacity and actionable, personalized explanations.

1. Formal Definition and Theoretical Foundations

The probability of sufficiency, introduced by Pearl, is generalized in causal concept-based XAI to assess the causal power of concept-level variables, not just low-level inputs. Given a black-box prediction function $h$ (e.g., a deep classifier) and a compact, human-meaningful set of concepts $z$ , the SCM embeds $h$ in a probabilistically coherent causal graph: $u \rightarrow z \rightarrow^\alpha x \rightarrow^h \hat{y}$ . Here, $u$ are exogenous sources, $z$ the semantic concepts, $\alpha$ a generative or editing map, and $h$ the classifier.

A concept intervention is formally a do-operation, e.g., $\mathrm{do}(C = c')$ , forcibly setting one or more concepts $C$ to a new value $c'$ and severing their incoming causal dependencies. The probability of sufficiency for an intervention changing output from $y$ to $y'$ is defined as

$PS_{C\to c',\,y'} = P\left( \hat{y}_{\mathrm{do}(C = c')} = y' \,\middle|\, z=z, w=w \right)$

where $(z,w)$ encode concepts and nuisance latents for an observed $x$ and $y$ (Bjøru et al., 2 Dec 2025). This answers: “Given the factual instance, what is the probability that flipping $C$ to $c'$ alone would suffice to change the model’s decision to $y'$ ?”

2. Computation and Algorithms for Probability of Sufficiency

When $h$ and $\alpha$ are deterministic, and the SCM is Markovian, PS can be computed for both local (instance-wise) and global (population) explanations using counterfactual simulation or Monte Carlo over exogenous sources $u$ . For a given intervention,

$PS_{C\to c',\,y'} = \sum_{u} P(u)\, I[h(\alpha(z_{\mathrm{do}(C = c')}, w))=y']$

where $z_{\mathrm{do}(C = c')}$ is the output of the SCM with $C$ fixed to $c'$ , and $I[\cdot]$ is the indicator function (Bjøru et al., 2 Dec 2025).

Local explanations fix the instance of interest and estimate PS for all candidate concept interventions using the above mechanism. Global explanations compute PS under population-level interventions:

$P(\hat{y}_{\mathrm{do}(C = c')} = y')$ averages over the marginal distribution of all $(z, w)$ .
Conditional (subgroup) PS computes average local PS for all instances with $C = c$ , $\hat{y} = y$ .

Semi-synthetic or learned generative models (e.g., StarGAN for facial images (Bjøru et al., 2 Dec 2025)) provide the requisite editing maps $\alpha$ and plausible in-distribution samples.

3. Concept Interventions, Difference-Making, and Causal Attribution

Probability of sufficiency operationalizes the notion of “difference-making”: a concept-level variable $C$ is a sufficient cause of a prediction change iff its PS is large for some $c \to c'$ . This instantiates the interventionist theory of explanation (Sani et al., 2020): only variables for which $PS$ is non-negligible are admitted as causal attributors; variables associated with the output merely via confounding or through proxy correlations (as detected by partial ancestral graphs, FCI, or similar causal discovery tools) will typically display low PS and are filtered out (Sani et al., 2020, Bjøru et al., 2 Dec 2025).

This approach sharply contrasts with associational feature attribution metrics (e.g., LIME, SHAP), which do not distinguish direct cause from statistical association. Notably, sufficiency and necessity probabilities, as framed in the LEWIS system (Galhotra et al., 2021), allow for instance-specific and context-aware evaluations of causal responsibility, producing actionable, counterfactually-grounded explanations and recourse.

4. Evaluation Protocols and Empirical Examples

Empirical studies use PS both as an explanation metric and a validation criterion. In classification tasks (e.g., CelebA faces), causal PS identifies which single attribute flips (e.g., “do(GrayHair=1)”) most strongly increase the probability of flipping the predicted class (e.g., Young $\to$ Old), sometimes quantifying this as, e.g., $PS(\textrm{Old}\mid \mathrm{do}(\textrm{GrayHair}=1))\approx 0.32$ (Bjøru et al., 2 Dec 2025). PS can be computed not just for singleton interventions, but for multi-concept interventions as well.

A summary of explanation types and PS-based metrics:

Explanation regime	Conditioning	PS formula
Local	$(z, w)$ fixed	$P(\hat{y}_{\mathrm{do}(C=c')}=y'\mid z, w)$
Global (marginal)	none	$P(\hat{y}_{\mathrm{do}(C=c')}=y')$
Subgroup	$C=c, \hat{y}=y$	$P(\hat{y}_{\mathrm{do}(C=c')}=y'\|C=c,\hat{y}=y)$

PS is used to rank candidate interventions, yielding an interpretable hierarchy of “most causally potent” concepts for a target class or subgroup.

Experiments on tabular and image data confirm the high interpretive value of PS explanations: top-PS concept flips correspond to manipulations that produce large, targeted shifts in model output, aligning with domain intuition and outperforming gradient-based concept attribution or association-rule mining in faithfulness and causal verification (Bjøru et al., 2 Dec 2025, Xu et al., 2020, Moreira et al., 2024).

5. Assumptions and Limitations

The causal faithfulness and interpretability of PS-based explanations hinge on several key assumptions:

Completeness of concept vocabulary: All high-level causes for $h$ are included in $z$ . Omitted causes are relegated to nuisance latents $w$ , assumed independent of $z$ to avoid spurious causal attributions (Bjøru et al., 2 Dec 2025).
SCM correctness: The causal graph $\mathcal{G}$ and concept structural equations reflect the actual data-generating process. Incomplete knowledge can be addressed by partially specified SCMs, reporting PS intervals that upper and lower bound the true causal effect (Bjøru et al., 2 Dec 2025).
Valid editing/intervention mechanism: The concept-to-input decoder $\alpha$ must yield in-distribution, semantically coherent counterfactuals. In practice, this may require high-quality generative models (e.g., StarGAN) (Bjøru et al., 2 Dec 2025).
No access to black-box internals: PS explanations are entirely post-hoc and query-based—fidelity is guaranteed as each concept intervention is carried forward through the unchanged black-box $h$ .

A significant limitation is the need for a comprehensive concept basis and an accurate SCM. Failure to satisfy these leads to indeterminacy or ambiguous PS intervals.

6. Broader Connections and Practical Implications

Probability of sufficiency for concept interventions underpins principled, faithfulness-guaranteed causal concept-based XAI. It directly informs actionable recourse, personalized model (counter)factual audit, and debugging:

Recourse and actionable feedback: By quantifying the sufficiency of specific concept changes to flip a model’s verdict, PS enables the identification of minimal, targeted interventions for desired outcomes (Galhotra et al., 2021).
Contrast with correlation-based explanations: PS explanations explicitly filter out spurious associations, only admitting interventions with genuine causal power. Empirical findings demonstrate that PS explanations deliver stable, actionable, and human-meaningful attributions, unlike LIME/SHAP (Bjøru et al., 2 Dec 2025, Galhotra et al., 2021).
Alignment and interpretability: When aligned with human-interpretable vocabularies and structural conditions (no concept mixing, monotonicity), PS explanations satisfy rigorous criteria for transparency and stakeholder communication (Marconato et al., 2023).

7. Future Directions

Recent research calls for extensions to multi-class and continuous concepts, partially specified or learned SCMs (using e.g., NO-TEARS structure penalties), automation of concept definition via LLMs, and integration of latent-variable concept representations. Computational efficiency for PS computation and relaxation of the expert-defined DAG requirement are ongoing themes (Moreira et al., 2024, Bjøru et al., 2 Dec 2025).

In summary, the probability of sufficiency of concept interventions is the foundational quantitative metric for causal, concept-based model explanations, supporting actionable, high-fidelity, human-interpretable XAI within the SCM framework (Bjøru et al., 2 Dec 2025, Galhotra et al., 2021, Moreira et al., 2024, Sani et al., 2020, Marconato et al., 2023, Xu et al., 2020).