Probability of Necessity and Sufficiency (PNS)

Updated 3 March 2026

Probability of Necessity and Sufficiency (PNS) is a counterfactual measure that defines when a treatment is both required and solely sufficient to produce an outcome.
It employs Tian–Pearl bounds using experimental and observational data to rigorously estimate causal effects across individual or subgroup levels.
Recent advances leverage machine learning to refine PNS estimations, reduce error margins, and support targeted decision-making in causal inference.

The probability of necessity and sufficiency (PNS) is a foundational concept in counterfactual causal inference that quantifies the probability that a treatment (or exposure) is both necessary for an outcome to occur and sufficient to guarantee it in an individual or subpopulation. PNS represents the proportion of units for which, if the treatment had not been given, the outcome would have been absent, and if the treatment were given, the outcome would have been present. PNS appears centrally in causal theory, decision-support, feature attribution, and recent machine-learning pipelines for individual-level or subgroup-level causal reasoning.

1. Formal Definition and Mathematical Bounds

Let $X\in\{x, x'\}$ be a binary treatment and $Y \in \{0,1\}$ a binary outcome. Using potential-outcomes notation:

$Y_x$ : the potential value of $Y$ under $X=x$
$Y_{x'}$ : the potential value of $Y$ under $X=x'$

The population-level probabilities of causation are:

Probability of Necessity (PN):

$\mathrm{PN} = P\left(Y_{x'} = 0 \mid X = x, Y = 1\right)$

Probability of Sufficiency (PS):

$\mathrm{PS} = P\left(Y_{x} = 1 \mid X = x', Y = 0\right)$

Probability of Necessity and Sufficiency (PNS):

$\mathrm{PNS} = P\left(Y_x=1,\, Y_{x'} = 0\right)$

PNS is thus the unconditional proportion of units for which the treatment $X=x$ would “single-handedly” ensure the positive outcome ( $Y=1$ ), and absence of the treatment ( $X=x'$ ) would preclude it ( $Y=0$ ).

Sharp Tian–Pearl Bounds

PNS is generally not point-identified but admits sharp bounds derived using a combination of experimental and observational data. The standard Tian–Pearl bounds are:

Lower bound:

$\mathrm{PNS} \geq \max\left\{0,\, P(Y=1 \mid X=x) - P(Y=1 \mid X=x')\right\}$

Upper bound:

$\mathrm{PNS} \leq \min\left\{P(Y=1 \mid X=x),\, P(Y=0 \mid X = x')\right\}$

When additional observational data are available, more complex forms with up to four terms (combinations of experimental marginals and joint observational probabilities) apply, yielding generally tighter bounds (Wang et al., 13 Feb 2025, Li et al., 2022).

2. Identification Assumptions and Interpretation

Evaluation and interpretation of PNS hinges on several core assumptions:

Consistency: $Y = Y_x$ whenever $X = x$ is actually observed.
No cross-world confounding: No unobserved confounders affect both $Y_x$ and $Y_{x'}$ beyond what is observed via $X$ .
Availability of appropriate data: At minimum, experimental (or otherwise deconfounded) estimates of $P(Y=1 \mid X=x)$ and $P(Y=1 \mid X = x')$ are required.

When assumptions such as monotonicity or exogeneity are satisfied (e.g., no "defiers" and no hidden confounding), PNS can often be expressed as a difference of identified conditional probabilities: $\mathrm{PNS} = P(Y=1 \mid X=x) - P(Y=1 \mid X=x')$ This representation is heavily used in robust model estimation and causal representation learning in machine learning settings.

Conceptually, PNS captures the probability that a treatment is both required and alone sufficient for an outcome—a property essential for personalized interventions and causal explanations.

3. Estimation Methodologies: Classical and Machine Learning Approaches

Estimating PNS for a given population or finely-grained subpopulation requires precise estimation of experimental and/or observational probabilities. Challenges arise when data for each subpopulation are limited, as numerous probabilities (marginals and joints) must be reliably estimated for sharp bounds.

Classical Plug-in Estimation

For a fully enumerated population or subgroup:

Compute the relevant probabilities empirically (via randomized controlled trials, or suitable observational adjustments)
Plug into the Tian–Pearl formulas to obtain lower and upper bounds (Li et al., 2022).
For reliable confidence intervals (e.g., $\epsilon$ -width at $1-\alpha$ confidence), sample sizes on the order of several thousand per arm are often required.

Machine Learning-Based Estimation

To address the "data sparsity" bottleneck for many subpopulations, recent work models PNS as a function of subpopulation covariates via regression (Wang et al., 13 Feb 2025, Wang et al., 22 May 2025).

Architecture: Multilayer perceptron (MLP, typically 2-3 hidden layers, Mish activation preferred) maps subpopulation descriptor $u$ to predicted PNS $(u)$ , with sigmoid output to enforce $[0,1]$ constraints.
Loss: Mean-squared error against ground-truth PNS (when available), or a bound-aware loss (constraining predictions within theoretical lower and upper bounds).
Training: Adam optimizer, learning rate $10^{-3}$ , batch size 128.
Performance: MLPs with Mish activation achieve mean absolute errors $\approx 0.02$ for $>$ 30,000 subpopulations given data from $\sim$ 2,000 labeled populations.

This approach enables population-level causal knowledge transfer: high-confidence PNS predictions for rare subgroups are obtained by leveraging patterns identified in data-rich subgroups.

4. Practical Applications and Simulation Evidence

PNS estimation has significant impact in several practical domains:

Decision Science: Supports individual-level or subgroup-level causal effect attribution, aiding in treatment assignment, legal responsibility, and targeted interventions.
Machine Learning and Model Explainability: Used in feature selection, model interpretation, and causally-grounded explanation generation—by quantifying not just the effect, but indispensable and decisive features.
Simulation Validation: Synthetic settings with simulated structural causal models (SCMs) have shown that ML-based PNS estimators trained on a small subset of "informative" populations generalize well (mean absolute errors $<0.03$ across tens of thousands of subgroups) (Wang et al., 13 Feb 2025, Wang et al., 22 May 2025).

5. Theoretical Guarantees and Sample Size Analysis

Sharp characterization of the estimation error, interval width, and required sample size for PNS computation is available:

Worst-case error: For $m$ experimental and $n$ observational samples, the two-sided $1-\alpha$ CI for the bounds has half-width $z_{1-\alpha/2} (\sqrt{1/m} + \sqrt{1/n})$ , where $z_q$ is the $q$ th quantile of the standard normal distribution. Conservative rule-of-thumb: $m = n \approx 6147$ for $\epsilon=0.05$ , $95\%$ CIs (Li et al., 2022, Cheng et al., 19 Feb 2026).
Delta method: For more general (possibly non-linear or piecewise-linear) bound functionals, the asymptotic variance of the estimated bound is given by a gradient-based form $\nabla U(\theta_0)^\top \Omega_r \nabla U(\theta_0)$ , with explicit sample size formulas for target error $\epsilon$ (Cheng et al., 19 Feb 2026).
Simulation evidence: Empirical errors in estimated bounds decrease rapidly with sample size, meeting the theoretical error thresholds in practice using these formulae.

6. Limitations and Extensions

While PNS is a central and expressive causal quantity, several limitations and directions warrant emphasis:

Non-identifiability: Absent monotonicity or perfect exogeneity, PNS is not point-identified, only bounded; bounds can be wide when underlying probabilities are extreme or weakly separated.
Structural sensitivity: Mechanisms for PNS estimation require correct specification of the SCM, or that the training SCM family well captures subpopulation behavior.
Extensibility: PNS theory and estimation extend to non-binary treatments/outcomes (multi-valued generalizations), mediation analysis (path-specific PNS), and multivariate or continuous representations (Li et al., 2022, Kawakami et al., 2024, Kawakami et al., 8 May 2025).
Model-based estimation: ML-based generalization is only as good as the coverage and representativeness of the labeled subpopulations, and may be sensitive to covariate shift or insufficient sample support.

Summary Table: Key Properties and Requirements

Property	Classical PNS	ML-based PNS Estimation	Reference
Main assumptions	Consistency, no cross-world confounding	Consistency, valid SCM/training	(Wang et al., 13 Feb 2025)
Sample size per subpop	1,300–6,000+	400–1,300 (labeled subpops)	(Li et al., 2022, Wang et al., 22 May 2025)
Theoretical bounds	Tian–Pearl (max-min)	Bounded MLP outputs	(Wang et al., 13 Feb 2025)
Typical MAE (simulation)	N/A	$\sim$ 0.02–0.06	(Wang et al., 13 Feb 2025, Wang et al., 22 May 2025)
Typical uses	Targeted attribution	Risk prediction/triage	--

PNS provides a rigorous, sharply bounded, and practically estimable metric for quantifying the individual- or subgroup-level sufficiency and necessity of binary treatments in producing binary outcomes, serving as a foundation for causal decision-making and interpretable machine learning (Wang et al., 13 Feb 2025, Wang et al., 22 May 2025, Li et al., 2022, Cheng et al., 19 Feb 2026).