Probabilistic Modelling is Sufficient for Causal Inference
Abstract: Causal inference is a key research area in machine learning, yet confusion reigns over the tools needed to tackle it. There are prevalent claims in the machine learning literature that you need a bespoke causal framework or notation to answer causal questions. In this paper, we want to make it clear that you \emph{can} answer any causal inference question within the realm of probabilistic modelling and inference, without causal-specific tools or notation. Through concrete examples, we demonstrate how causal questions can be tackled by writing down the probability of everything. Lastly, we reinterpret causal tools as emerging from standard probabilistic modelling and inference, elucidating their necessity and utility.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What this paper is about
This paper takes a clear position: if you model the world carefully using probabilities (a “probabilistic model”), you already have everything you need to do causal inference — that is, to figure out what causes what, what would happen if we changed something, and even what would have happened under a different choice in the past.
In short: good probability models are enough to answer causal questions.
The main questions the paper tackles
The authors focus on simple, practical versions of big questions:
- Can we ask “what if we change X?” using only probability tools?
- Can we answer “what would have happened if…?” (a counterfactual) without inventing a separate, special causal math system?
- Can familiar tools like Bayesian networks (diagrams with arrows that show how things influence each other) and standard probability rules handle interventions and counterfactuals?
Their claim: yes.
How they approach the problem
They show how to treat causal questions using familiar probability ideas and models:
- Probabilistic models: Think of these as a detailed recipe for how the world produces data, with chances attached to different outcomes. They capture uncertainty and relationships.
- Bayesian networks: These are like flowcharts with arrows, where each arrow shows influence (for example, “exercise → health”). They come with probabilities that tell us how likely something is, given its causes.
- Interventions (“what if we change X?”): To model an intervention, you “cut” the usual incoming influences into X and set X to a chosen value. In a diagram, it’s like unplugging the wires coming into X and turning a dial to a new setting. Then you recompute the probabilities downstream to see what changes.
- Counterfactuals (“what would have happened if…?”): This uses a three-step pattern that fits cleanly in probability models: 1) Abduction: Use what you actually observed to figure out the likely hidden details of this specific case (like the person’s unique health factors). 2) Action: Change the thing you’re curious about (e.g., “set dose to 2x”) in the model. 3) Prediction: Recalculate what would follow under this changed setting for this specific person.
- One-model view: Instead of building entirely different models for each scenario, you can keep a single probabilistic model and represent interventions inside it (for example, using “gates” or switches that say “in this scenario, X is set by us, not by its usual causes”).
The big idea is that these steps and representations are all standard probability modeling, just used in a careful way to represent “doing” (changing the system) in addition to “seeing” (observing it).
What they find and why it matters
Main takeaways:
- You can express interventions and counterfactuals within the language of probability. You don’t need a separate toolbox to do causal reasoning.
- Bayesian networks and related probabilistic tools can represent both everyday associations (“people who exercise tend to be healthier”) and true causal questions (“if we increase exercise, how does health change?”).
- Standard inference methods (the algorithms we already use to compute probabilities in these models) can answer causal queries when interventions are properly represented.
- This unifies causal inference with the broader, well-understood world of probabilistic modeling. That means researchers and practitioners can use familiar methods to tackle causal problems.
Why this is important:
- It lowers the barrier to entry. If you know probability and basic graphical models, you can do causal inference.
- It encourages clearer thinking: causal questions become precise “what if we change X?” queries inside a single, consistent model.
- It helps connect causal reasoning to modern machine learning tools (like deep generative models) that are already probabilistic at their core.
What this could change going forward
If the community accepts this view, more people can do causal reasoning using tools they already understand. That could lead to:
- Better decisions in medicine, policy, and science (e.g., “What if we change the vaccination strategy?” or “What would have happened if this patient had received a different dose?”).
- Smoother integration of causal questions into machine learning systems, from fairness analyses to safe decision-making.
- Faster progress, because researchers won’t need to switch languages or frameworks to ask causal questions — they can extend the probabilistic models they already use.
In essence, the paper argues for a simple but powerful shift: stick with probability, model interventions carefully, and you can answer the causal questions that matter.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
The paper advances a position that probabilistic modelling is sufficient for causal inference. Several concrete gaps and unresolved questions remain for future work:
- Precisely characterize the model classes for which probabilistic modelling is “sufficient” for causal inference (e.g., Bayesian networks with gates, probabilistic programming with counterfactual operators, measure-theoretic causal kernels), and formally delineate where this sufficiency fails.
- Provide theorem-level equivalence results (with conditions) between probabilistic modelling and SCM/do-calculus for interventions and counterfactuals, including soft/stochastic interventions, imperfect compliance, transportability, and selection bias.
- Develop algorithmic criteria to decide identifiability of causal queries within purely probabilistic models (given a model class and observed data), and map them to known graphical criteria (back-door/front-door, instrumental variables) without invoking SCM syntax.
- Clarify how latent confounders are represented and handled in a strictly probabilistic framework, including testable implications, falsification procedures, and bounds when identifiability fails.
- Specify how cycles/feedback loops are accommodated (e.g., through gated models or causal kernels), including existence and uniqueness of interventional/counterfactual distributions and generalizations of d-separation for cyclic graphs.
- Make the abduction–action–prediction pipeline fully operational within a probabilistic framework: give practical algorithms for abduction (estimating exogenous noise or individualized latent states), quantify uncertainty, and analyze approximation error.
- Establish sensitivity analysis tools tailored to probabilistic causal models (e.g., priors, likelihood misspecification, parametric vs. nonparametric choices) and derive robustness guarantees for estimated effects.
- Provide guidance for specifying interventions in probabilistic models beyond hard do-operations: encode mechanism changes, soft interventions, policy interventions, and unknown/partial interventions, with clear semantics and inference procedures.
- Address sequential decision-making and continuous-time policies (policy interventions), including identifiability and off-policy evaluation in joint models of treatments and outcomes.
- Demonstrate scalability and reliability of probabilistic counterfactual inference in high-dimensional structured data (images, text, graphs), including standardized calibration metrics for abduction and counterfactual validity.
- Develop practical structure-learning methods that integrate observational and interventional data within the probabilistic framework, including the case where intervention targets are unknown or partially observed.
- Propose diagnostic tests and model criticism workflows for probabilistic causal models (testable implications, constraint-based checks, posterior predictive tests for interventions and counterfactuals).
- Specify computability and complexity bounds for causal inference via probabilistic message passing and variational methods (e.g., gated graphical models): when is exact or approximate inference tractable?
- Integrate data fusion (RCTs plus observational studies) rigorously in the probabilistic framework, with conditions for transportability/generalizability and algorithms for combining datasets with different selection mechanisms.
- Clarify how “causes of effects” (individual-level attribution) can be framed and computed within probability, including bounds, identifiability conditions, and legal/decision-theoretic interpretations.
- Provide a principled approach to variable construction and representation learning for causal inference in probabilistic models (linking learned representations/concepts to intervention semantics and avoiding post-treatment leakage).
- Establish benchmarking protocols and empirical comparisons against SCM-based and potential outcomes methods on standard datasets, with criteria for validity, identifiability, and computational efficiency.
- Offer practitioner guidelines for model selection, assumption articulation, and reporting standards when performing causal inference via probabilistic modelling, including common pitfalls and best practices.
- Deliver open-source implementations in probabilistic programming languages that support do- and counterfactual operators, with reproducible examples spanning association, intervention, and counterfactual levels.
- Articulate the limits of probabilistic sufficiency in domains involving strategic agents, equilibrium selection, or normative constraints (e.g., fairness), and propose extensions or hybrid frameworks to address these cases.
Glossary
- Abduction: In causal inference, inferring latent causes or exogenous variables consistent with observed data. Example: "accurate abduction"
- Bayesian networks: Directed acyclic graphs that encode probabilistic dependencies among variables via conditional distributions. Example: "Modelling Interventions with Bayesian Networks"
- Causal calculus: A set of formal rules (e.g., Pearl’s do-calculus) for manipulating probabilities under interventions. Example: "A Causal Calculus for Statistical Research"
- Causal influence diagrams: Graphical models that combine causal structure with decision and utility nodes for reasoning about interventions. Example: "Causal Influence Diagrams"
- Causal kernels: Transition probability kernels that encode causal mechanisms within a measure-theoretic framework. Example: "causal kernels"
- Causal mediation analysis: Decomposing an effect into direct and indirect (mediated) components through specified mediator variables. Example: "causal mediation analysis"
- Causal space: A probability space augmented with causal kernels to formalize interventions and causal relationships. Example: "causal space"
- Counterfactual fairness: A fairness criterion defined by comparing outcomes under counterfactual changes to protected attributes. Example: "Counterfactual Fairness"
- Counterfactual generative models: Generative models that support queries about hypothetical outcomes under alternative interventions. Example: "A Language for Counterfactual Generative Models"
- Counterfactual queries: Questions about what would have happened under different interventions or conditions. Example: "Counterfactual Queries"
- Counterfactuals: Hypothetical outcomes that would occur under alternative actions or interventions. Example: "Counterfactuals"
- d-separation: A graphical criterion to determine conditional independence in directed acyclic graphs. Example: "d-separation"
- do-operator: Notation (do(X=x)) representing an intervention that sets a variable to a specific value. Example: "Pearl's do"
- Exogenous noise variables: External, unobserved stochastic inputs in structural causal models that account for variability. Example: "exogenous noise variables"
- Faithfulness: The assumption that all and only the independencies in the data correspond to d-separation in the causal graph. Example: "faithfulness"
- Gaussian processes: Nonparametric probabilistic models over functions used for regression and time-continuous modeling. Example: "Gaussian processes"
- Identifiability: The property that a causal quantity or structure can be uniquely determined from the observed data and assumptions. Example: "Identifiability of Causal Graphs"
- Identifiable Functional Model Classes (IFMOCs): Classes of functional causal models for which the causal graph is identifiable from the joint distribution. Example: "Identifiable Functional Model Classes (IFMOCs)"
- Interventional distributions: Probability distributions resulting from externally intervening on variables. Example: "joint interventional distributions"
- Interventions: External actions that set variables to specific values, breaking their usual causal dependencies. Example: "Modelling Interventions with Bayesian Networks"
- Latent variables: Unobserved variables that influence observed data within probabilistic or causal models. Example: "latent variables"
- Markov condition: The factorization of a joint distribution according to a causal DAG where each node is independent of non-descendants given its parents. Example: "Markov condition"
- Markov equivalence: The set of DAGs that imply the same conditional independence relations. Example: "Markov equivalence"
- Measure-theoretic axiomatisation: Formalizing causality using measure-theoretic probability structures and kernels. Example: "A Measure-Theoretic Axiomatisation of Causality"
- Normalising flows: Invertible transformations used to build flexible probability distributions for generative modeling and inference. Example: "normalising flows"
- Pearl's ladder of causation: The hierarchy of causal reasoning levels: association, intervention, and counterfactuals. Example: "Pearl's ladder of causation"
- Point processes: Stochastic processes modeling discrete events over continuous time. Example: "point processes"
- Potential outcomes: A framework that represents outcomes under different treatment assignments for causal effect estimation. Example: "Potential Outcomes"
- Semi-Markovian: Refers to causal models with latent confounding and certain structural properties beyond simple Markovian assumptions. Example: "recursive semi-Markovian causal models"
- Structural causal models (SCMs): Causal models specified by structural equations and exogenous noise variables. Example: "Structural Causal Models"
- Structural equation modeling: A statistical approach modeling relationships among variables via systems of equations, often with latent variables. Example: "Bayesian structural equation modeling"
- Variational inference: A family of optimization-based methods to approximate complex posterior distributions. Example: "Variational inference"
Practical Applications
Immediate Applications
The following applications can be deployed now by reframing causal questions within existing probabilistic modeling stacks and workflows.
- Software Engineering (Software): Fault localization and experiment planning by treating dataflow graphs as causal graphs and performing probabilistic inference over interventions.
- Potential tools/products/workflows: Flow-based programming frameworks with a “causal analysis” plugin; a gated Bayesian network layer for intervention toggles; belief propagation for interventional queries.
- Assumptions/dependencies: Accurate component-level logging; a faithful mapping from dataflow to probabilistic models; identifiable causal effects for the questions asked.
- Analytics/ML Platforms (Software): Add a
do-operator API and gating semantics in probabilistic programming languages (e.g., Pyro, Stan, Turing) to compute interventional distributions via standard inference.- Potential tools/products/workflows: “Causal-PPL” library modules; single-model interventions encoded as context-specific conditionals; off-the-shelf inference (MCMC/VI) for interventional queries.
- Assumptions/dependencies: Correct model specification; appropriate identifiability conditions; measured confounders or sensitivity analyses when not.
- Healthcare Operations and Clinical Decision Support (Healthcare): Patient-level counterfactual prediction using deep structural causal models for imaging or EHR data (e.g., abduction–action–prediction workflow with normalizing flows).
- Potential tools/products/workflows: Counterfactual imaging estimators for radiology QA; treatment effect simulation dashboards for care pathways; causal mediation analysis within generative models.
- Assumptions/dependencies: High-quality clinical data; domain-informed causal structure; robust abduction of latent noise; constraints for fairness and compliance.
- Public Health Policy Evaluation (Policy): Retrospective evaluation of vaccine allocation or NPIs by combining compartmental simulations with probabilistic causal models to estimate counterfactual policy impacts.
- Potential tools/products/workflows: Simulation-assisted causal modeling pipelines; policy scenario explorers for epidemic response; modular causal kernels for immunity waning and age risk profiles.
- Assumptions/dependencies: Reliable surveillance data; calibrated epidemic parameters; uncertainty quantification for policy comparison; validity of structural assumptions.
- Algorithmic Fairness Audits (Software/Policy): Counterfactual fairness checks and debiasing via probabilistic models that intervene on protected attributes while holding latent factors fixed.
- Potential tools/products/workflows: VAE-based fair autoencoders; counterfactual audit suites; bias measurement frameworks with robust experimental setup.
- Assumptions/dependencies: Clear fairness definitions; tested robustness of counterfactual generation; careful treatment of unobserved confounding.
- A/B Testing and Observational Analytics (Education/Software/Marketing): Recasting observational logs into causal queries within Bayesian networks to estimate effects of product or curriculum interventions without randomized trials.
- Potential tools/products/workflows: “Causal A/B” tooling for product analytics; policy-aware treatment assignment models (point processes + GPs) for sequences of interventions.
- Assumptions/dependencies: Logging of relevant covariates; overlap and positivity; diagnostic checks for selection bias; policy modeling accuracy.
- Financial Stress Testing and Policy Simulation (Finance): Simulate the impact of regulatory changes or risk management policies via interventions in probabilistic macro–micro models.
- Potential tools/products/workflows: BN-based scenario engines; stress-test workbenches powered by probabilistic inference; counterfactual risk attribution.
- Assumptions/dependencies: Structural validity of economic relationships; stability under interventions; careful handling of latent shocks.
- Robotics and Safety (Robotics): Causal influence diagram–driven decision support implemented as probabilistic inference to evaluate intervention outcomes and mitigate unsafe policies.
- Potential tools/products/workflows: PPL-based planning with causal kernels; intervention-aware simulation; safety case documentation with probabilistic counterfactuals.
- Assumptions/dependencies: Adequate environment models; calibrated uncertainty; rigorous validation against real-world behavior.
- Legal and Expert Testimony (Policy/Law): Apply decision-theoretic probabilistic causality to estimate probabilities of necessity/sufficiency by fusing observational and experimental evidence.
- Potential tools/products/workflows: Litigation support calculators for “effects of causes” vs “causes of effects” framed probabilistically; evidentiary synthesis workflows.
- Assumptions/dependencies: Availability of relevant data sources; court-acceptable assumptions and bounds; transparent uncertainty reporting.
- Demand Response and Grid Operations (Energy): Model intervention effects of pricing or control policies using probabilistic models with gating for policy changes and standard inference for outcomes.
- Potential tools/products/workflows: Causal-enabled energy simulators; policy evaluation dashboards; sequential treatment models for operational policies.
- Assumptions/dependencies: Quality of load and pricing data; causal structure capturing feedback; policy compliance and external factors.
Long-Term Applications
The following applications require further research, scaling, standardization, or development before broad deployment.
- Enterprise Causal Platform Standardization (Software): A unified, measure-theoretic foundation for causal kernels and
dosemantics embedded across PPLs and data platforms.- Potential tools/products/workflows: Cross-language causal APIs; standardized intervention DSLs; governance for causal model versioning and audits.
- Assumptions/dependencies: Community consensus on semantics; backward-compatible tooling; institutional commitment to causal governance.
- High-Dimensional Counterfactuals at Scale (Healthcare/Vision/Multimodal): Reliable counterfactual generation for complex modalities (e.g., diffusion-based SCMs for imaging, audio, text).
- Potential tools/products/workflows: Counterfactual generators integrated into clinical and scientific pipelines; automated mediation effect estimators for structured data.
- Assumptions/dependencies: Compute and data scale; evaluation metrics for counterfactual fidelity and axiomatic soundness; defense against shortcut learning.
- Real-Time Counterfactual Decisioning (Software/Marketing/Recommenders): Streaming causal inference for production systems to estimate and act on intervention effects under concept drift.
- Potential tools/products/workflows: Online abduction–action–prediction services; causal bandits combining exploration with interventions; policy impact monitors.
- Assumptions/dependencies: Stable, fast inference; drift detection and recalibration; safe exploration policies; privacy-preserving computation.
- Automated Causal Discovery with Interventions (Academia/Industry): Integrate identifiable functional model classes (IFMOCs) and unknown intervention handling into PPL-based structure learning.
- Potential tools/products/workflows: Hybrid discovery engines that propose structures and validate via interventional or gated inference; active experimentation planners.
- Assumptions/dependencies: Sufficient interventional/observational diversity; testable implications and diagnostics; domain expert oversight.
- Policy Optimization Under Stochastic Treatment Rules (Policy/Healthcare): Optimize treatment policies defined as stochastic processes using joint probabilistic models of treatments and outcomes in continuous time.
- Potential tools/products/workflows: Sequential policy optimizers for healthcare (e.g., insulin titration); causal simulators for adaptive public policies.
- Assumptions/dependencies: Accurate modeling of policy dynamics; robustness to feedback loops; ethical and regulatory constraints.
- Causal Education and Workforce Enablement (Academia/Industry): Mainstream causal inference in probabilistic modeling curricula and certification programs for data scientists and engineers.
- Potential tools/products/workflows: Courseware centered on Bayesian networks, gating, and counterfactuals in PPLs; hands-on labs with intervention APIs.
- Assumptions/dependencies: Institutional adoption; consistent pedagogy; alignment with industry needs.
- Causal Audit and Compliance Frameworks for AI (Policy/Software): Standardized, probabilistically grounded audits for fairness, safety, and externalities, integrated with model monitoring.
- Potential tools/products/workflows: Omega-like counterfactual DSLs for audits; compliance dashboards; third-party certification services.
- Assumptions/dependencies: Regulatory clarity; interoperable reporting standards; scalable and reproducible audit pipelines.
- Safety-Critical Simulation Governance (Robotics/Autonomy/AGI Safety): Use probabilistic causal models to govern simulation-to-reality transfer and evaluate intervention risks before deployment.
- Potential tools/products/workflows: CID-driven governance frameworks; pre-deployment counterfactual stress tests; safety case templates.
- Assumptions/dependencies: Trustworthy simulators; coverage of edge cases; cultural and organizational buy-in.
- Energy Systems Policy Co-Design (Energy/Policy): Joint design of tariffs, control strategies, and infrastructure via probabilistic causal modeling of interventions and feedback in cyber-physical grids.
- Potential tools/products/workflows: Integrated planning simulators; intervention schedulers; causal sensitivity analysis for resilience.
- Assumptions/dependencies: Rich telemetry; multi-scale causal structures; stakeholder coordination and incentive alignment.
Collections
Sign up for free to add this paper to one or more collections.