Create a Video View Paper

Large Causal Models for Temporal Causal Discovery

This presentation introduces Large Causal Models, a breakthrough foundation-model approach to temporal causal discovery. The talk explores how these models overcome the limitations of traditional dataset-specific methods by learning to discover causal relationships across diverse time-series data through multi-dataset pretraining. We examine the core methodology, experimental validation across synthetic and real-world datasets, and the implications for scalable causal inference in temporal settings.

Script

Every causal discovery method ever built faces the same costly limitation: train it on one dataset, and it fails on the next. The researchers behind this work asked a provocative question: what if a single model could learn causal structure across hundreds of datasets at once?

Traditional temporal causal discovery treats every dataset as a unique puzzle requiring its own custom solution. This dataset-specific paradigm creates a bottleneck: each new time series demands fresh modeling effort, making the approach impractical for the scale and diversity of real-world causal inference problems.

The authors propose a radical departure from this one-dataset-at-a-time approach.

Large Causal Models borrow the foundation model strategy from language and vision: train once on diverse data, then apply everywhere. A transformer architecture learns to predict lagged causal graphs from time series in a single forward pass, eliminating the need for dataset-specific optimization.

The training strategy combines two data sources. Synthetic generators produce time series from causal graphs with controllable structure, while real-world datasets from domains like climate and energy systems ensure the model learns patterns that actually exist in nature, not just in simulations.

The architecture itself is elegant. A transformer backbone ingests the time series, leveraging correlation statistics to guide learning. A feedforward head then outputs a lagged adjacency tensor, a compact representation that encodes which variables cause which others and at what time delays.

This diagram captures the full pipeline. On the left, synthetic and realistic generators produce training pairs: time series and their ground-truth causal graphs. These pairs feed into supervised learning, teaching the model to map from observed data to underlying causal structure. The output is a lagged adjacency tensor that represents the discovered causal dependencies across time.

The results validate the foundation model bet. Large Causal Models match or exceed the accuracy of traditional constraint-based and functional approaches on synthetic benchmarks, but the real story is generalization: they perform robustly on real-world datasets they were never explicitly trained on, all while delivering predictions in a single pass.

This heatmap visualization shows the model in action. The top row is the ground truth causal structure, the bottom row is what the Large Causal Model discovered. The alignment reveals that the model has learned to identify not just which variables are causally connected, but the specific time lags at which those connections operate.

Large Causal Models reframe temporal causal discovery as a pretraining problem, turning a field of bespoke solutions into one where a single model generalizes across datasets. Visit EmergentMind.com to learn more and create your own research videos.