Probability of Necessity and Sufficiency (PNS)
- Probability of Necessity and Sufficiency (PNS) is a counterfactual measure that defines when a treatment is both required and solely sufficient to produce an outcome.
- It employs Tian–Pearl bounds using experimental and observational data to rigorously estimate causal effects across individual or subgroup levels.
- Recent advances leverage machine learning to refine PNS estimations, reduce error margins, and support targeted decision-making in causal inference.
The probability of necessity and sufficiency (PNS) is a foundational concept in counterfactual causal inference that quantifies the probability that a treatment (or exposure) is both necessary for an outcome to occur and sufficient to guarantee it in an individual or subpopulation. PNS represents the proportion of units for which, if the treatment had not been given, the outcome would have been absent, and if the treatment were given, the outcome would have been present. PNS appears centrally in causal theory, decision-support, feature attribution, and recent machine-learning pipelines for individual-level or subgroup-level causal reasoning.
1. Formal Definition and Mathematical Bounds
Let be a binary treatment and a binary outcome. Using potential-outcomes notation:
- : the potential value of under
- : the potential value of under
The population-level probabilities of causation are:
- Probability of Necessity (PN):
- Probability of Sufficiency (PS):
- Probability of Necessity and Sufficiency (PNS):
PNS is thus the unconditional proportion of units for which the treatment would “single-handedly” ensure the positive outcome (), and absence of the treatment () would preclude it ().
Sharp Tian–Pearl Bounds
PNS is generally not point-identified but admits sharp bounds derived using a combination of experimental and observational data. The standard Tian–Pearl bounds are:
- Lower bound:
- Upper bound:
When additional observational data are available, more complex forms with up to four terms (combinations of experimental marginals and joint observational probabilities) apply, yielding generally tighter bounds (Wang et al., 13 Feb 2025, Li et al., 2022).
2. Identification Assumptions and Interpretation
Evaluation and interpretation of PNS hinges on several core assumptions:
- Consistency: whenever is actually observed.
- No cross-world confounding: No unobserved confounders affect both and beyond what is observed via .
- Availability of appropriate data: At minimum, experimental (or otherwise deconfounded) estimates of and are required.
When assumptions such as monotonicity or exogeneity are satisfied (e.g., no "defiers" and no hidden confounding), PNS can often be expressed as a difference of identified conditional probabilities: This representation is heavily used in robust model estimation and causal representation learning in machine learning settings.
Conceptually, PNS captures the probability that a treatment is both required and alone sufficient for an outcome—a property essential for personalized interventions and causal explanations.
3. Estimation Methodologies: Classical and Machine Learning Approaches
Estimating PNS for a given population or finely-grained subpopulation requires precise estimation of experimental and/or observational probabilities. Challenges arise when data for each subpopulation are limited, as numerous probabilities (marginals and joints) must be reliably estimated for sharp bounds.
Classical Plug-in Estimation
For a fully enumerated population or subgroup:
- Compute the relevant probabilities empirically (via randomized controlled trials, or suitable observational adjustments)
- Plug into the Tian–Pearl formulas to obtain lower and upper bounds (Li et al., 2022).
- For reliable confidence intervals (e.g., -width at confidence), sample sizes on the order of several thousand per arm are often required.
Machine Learning-Based Estimation
To address the "data sparsity" bottleneck for many subpopulations, recent work models PNS as a function of subpopulation covariates via regression (Wang et al., 13 Feb 2025, Wang et al., 22 May 2025).
- Architecture: Multilayer perceptron (MLP, typically 2-3 hidden layers, Mish activation preferred) maps subpopulation descriptor to predicted PNS, with sigmoid output to enforce constraints.
- Loss: Mean-squared error against ground-truth PNS (when available), or a bound-aware loss (constraining predictions within theoretical lower and upper bounds).
- Training: Adam optimizer, learning rate , batch size 128.
- Performance: MLPs with Mish activation achieve mean absolute errors for 30,000 subpopulations given data from 2,000 labeled populations.
This approach enables population-level causal knowledge transfer: high-confidence PNS predictions for rare subgroups are obtained by leveraging patterns identified in data-rich subgroups.
4. Practical Applications and Simulation Evidence
PNS estimation has significant impact in several practical domains:
- Decision Science: Supports individual-level or subgroup-level causal effect attribution, aiding in treatment assignment, legal responsibility, and targeted interventions.
- Machine Learning and Model Explainability: Used in feature selection, model interpretation, and causally-grounded explanation generation—by quantifying not just the effect, but indispensable and decisive features.
- Simulation Validation: Synthetic settings with simulated structural causal models (SCMs) have shown that ML-based PNS estimators trained on a small subset of "informative" populations generalize well (mean absolute errors across tens of thousands of subgroups) (Wang et al., 13 Feb 2025, Wang et al., 22 May 2025).
5. Theoretical Guarantees and Sample Size Analysis
Sharp characterization of the estimation error, interval width, and required sample size for PNS computation is available:
- Worst-case error: For experimental and observational samples, the two-sided CI for the bounds has half-width , where is the th quantile of the standard normal distribution. Conservative rule-of-thumb: for , CIs (Li et al., 2022, Cheng et al., 19 Feb 2026).
- Delta method: For more general (possibly non-linear or piecewise-linear) bound functionals, the asymptotic variance of the estimated bound is given by a gradient-based form , with explicit sample size formulas for target error (Cheng et al., 19 Feb 2026).
- Simulation evidence: Empirical errors in estimated bounds decrease rapidly with sample size, meeting the theoretical error thresholds in practice using these formulae.
6. Limitations and Extensions
While PNS is a central and expressive causal quantity, several limitations and directions warrant emphasis:
- Non-identifiability: Absent monotonicity or perfect exogeneity, PNS is not point-identified, only bounded; bounds can be wide when underlying probabilities are extreme or weakly separated.
- Structural sensitivity: Mechanisms for PNS estimation require correct specification of the SCM, or that the training SCM family well captures subpopulation behavior.
- Extensibility: PNS theory and estimation extend to non-binary treatments/outcomes (multi-valued generalizations), mediation analysis (path-specific PNS), and multivariate or continuous representations (Li et al., 2022, Kawakami et al., 2024, Kawakami et al., 8 May 2025).
- Model-based estimation: ML-based generalization is only as good as the coverage and representativeness of the labeled subpopulations, and may be sensitive to covariate shift or insufficient sample support.
Summary Table: Key Properties and Requirements
| Property | Classical PNS | ML-based PNS Estimation | Reference |
|---|---|---|---|
| Main assumptions | Consistency, no cross-world confounding | Consistency, valid SCM/training | (Wang et al., 13 Feb 2025) |
| Sample size per subpop | 1,300–6,000+ | 400–1,300 (labeled subpops) | (Li et al., 2022, Wang et al., 22 May 2025) |
| Theoretical bounds | Tian–Pearl (max-min) | Bounded MLP outputs | (Wang et al., 13 Feb 2025) |
| Typical MAE (simulation) | N/A | 0.02–0.06 | (Wang et al., 13 Feb 2025, Wang et al., 22 May 2025) |
| Typical uses | Targeted attribution | Risk prediction/triage | -- |
PNS provides a rigorous, sharply bounded, and practically estimable metric for quantifying the individual- or subgroup-level sufficiency and necessity of binary treatments in producing binary outcomes, serving as a foundation for causal decision-making and interpretable machine learning (Wang et al., 13 Feb 2025, Wang et al., 22 May 2025, Li et al., 2022, Cheng et al., 19 Feb 2026).