Papers
Topics
Authors
Recent
2000 character limit reached

Do-PFN: In-Context Learning for Causal Effect Estimation (2506.06039v2)

Published 6 Jun 2025 in cs.LG

Abstract: Estimation of causal effects is critical to a range of scientific disciplines. Existing methods for this task either require interventional data, knowledge about the ground truth causal graph, or rely on assumptions such as unconfoundedness, restricting their applicability in real-world settings. In the domain of tabular machine learning, Prior-data fitted networks (PFNs) have achieved state-of-the-art predictive performance, having been pre-trained on synthetic data to solve tabular prediction problems via in-context learning. To assess whether this can be transferred to the harder problem of causal effect estimation, we pre-train PFNs on synthetic data drawn from a wide variety of causal structures, including interventions, to predict interventional outcomes given observational data. Through extensive experiments on synthetic case studies, we show that our approach allows for the accurate estimation of causal effects without knowledge of the underlying causal graph. We also perform ablation studies that elucidate Do-PFN's scalability and robustness across datasets with a variety of causal characteristics.

Summary

  • The paper introduces Do-PFN, a network that uses synthetic structural causal models to perform in-context learning for predicting interventional outcomes.
  • It adapts transformer architectures to differentiate treatment from covariates, yielding superior estimates of conditional interventional distributions.
  • Experiments demonstrate that Do-PFN outperforms baselines in CATE and CID estimation across both synthetic and hybrid real-world datasets.

Do-PFN: In-Context Learning for Causal Effect Estimation

The paper "Do-PFN: In-Context Learning for Causal Effect Estimation" introduces Do-PFN, a Prior-data Fitted Network designed to estimate causal effects from observational data by employing in-context learning. This method is trained on synthetic data deriving from structural causal models (SCMs), which helps Do-PFN to predict interventional outcomes without requiring explicit knowledge of causal graphs.

Methodology: Causal Inference with PFNs

Modeling Assumptions

Do-PFN assumes a prior distribution over SCMs, which allows it to simulate both observational and interventional data. The sampling of these datasets involves drawing noise and parameters from predefined distributions. The methodology is presented to address estimation challenges when traditional causal effect estimation assumptions (e.g., unconfoundedness) can't be guaranteed.

Architecture and Training Details

Do-PFN adapts the transformer architecture used in TabPFN with subtle modifications to its input representation, enabling it to differentiate between treatment and covariates. The training process involves optimizing the negative log-likelihood of the predicted interventional outcomes with observational data as context. Figure 1

Figure 1: Do-PFN overview: Do-PFN conducts in-context learning (ICL) for causal effect estimation, predicting conditional interventional distributions (CIDs) based purely on observational data.

Experiments

Predicting Conditional Interventional Distributions (CIDs)

Do-PFN was tested against various baselines, including Random Forests and an alternate variant called Dont-PFN that predicts observational outcomes. Results illustrate that Do-PFN outperforms these methods across diverse synthetic case studies, especially in scenarios involving unidentifiable causal effects. Its ability to implicitly identify causal graph structures bolsters its interventional prediction capabilities across multiple scenarios. Figure 2

Figure 2: Case studies: Graph structures in our six causal case studies necessitating automatic adjustments based on causal criteria.

Estimating Conditional Average Treatment Effects (CATEs)

In CATE estimation tasks, Do-PFN showed superior performance against meta-learners and double machine learning methods. This can be attributed to its balanced bias levels across interventions, which indirectly enhances the accuracy of its estimations. Unlike CID predictions, the bias cancels out in CATE calculations, leading to more precise results. Figure 3

Figure 3

Figure 3

Figure 3: Results on synthetic data: Do-PFN's performance in estimating CIDs, CATEs, and ATEs across tasks.

Hybrid Synthetic-Real-World Data Evaluations

Do-PFN was evaluated on datasets such as Amazon Sales, demonstrating comparable performance to gold-standard DoWhy models that utilize ground-truth causal graphs. For both CID prediction and CATE estimation, Do-PFN maintained competitive performance, highlighting its robustness in practical applications. Figure 4

Figure 4

Figure 4: Real-world case studies: Agreed-upon causal graphs for the Amazon Sales and Law School Admissions datasets.

Ablation Studies

The paper includes extensive ablation studies revealing Do-PFN's robustness across varying dataset sizes and graph complexities. It demonstrates that additional data can mitigate noise influences in the datasets, and shows consistent performance across complexities of graph structures. Figure 5

Figure 5

Figure 5

Figure 5

Figure 5: In-distribution analysis: Ablation across base rates of ATE, dataset sizes, and graph complexities.

Discussion

Real-World Benchmarking

Generalization to real-world data necessitates careful design of the SCM prior, as discrepancies between modeled prior and real-world causal relationships can impact performance. Future checks and validations on this aspect are crucial to ensure robustness.

Trust and Interpretability

The methodology relies on presumed causal relationships via Do-PFN's learning mechanisms, emphasizing a need for interpretability studies to enhance trust in automated causal inference.

Extensions to Further Causal Tasks

Do-PFN's strong foundational framework suggests potential extensions to broader intervention scenarios and observational data types. Its current iteration forms the basis for exploring more advanced causal inference tasks.

Conclusion

Do-PFN emerges as a promising AI tool, effectively leveraging pre-trained networks for causal effect estimation from observational data. The paper concludes with optimism regarding Do-PFN's integration into standard machine learning practices, due to its outstanding ability to manage complex causal inference situations independently of traditional causal graph inputs.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 2 likes about this paper.