Algorithmic Causality: A Compression Approach
- Algorithmic Causality is a research paradigm that employs algorithmic information theory and compression-based methods to infer causal relationships from individual, non-iid observations.
- It replaces statistical independence with algorithmic mutual information to generalize the Causal Markov condition and discern causal structure from compressible data dependencies.
- This approach enables causal inference in unique data regimes, facilitating model selection and discovery where traditional probabilistic methods fail.
Algorithmic causality is a research paradigm that reinterprets the discovery and quantification of causal relationships through the lens of algorithmic information theory, particularly by leveraging concepts such as Kolmogorov complexity, algorithmic mutual information, and compression-based regularities. Instead of relying solely on statistical independence and probabilistic graphical models, algorithmic causality advances causal inference and discovery by focusing on the descriptional (computational) relationships among observed data, mechanisms, and models. This approach is particularly relevant in regimes where statistical identifiability fails, repeated sampling is impossible, or traditional assumptions—such as prior knowledge of intervention targets—do not hold.
1. Algorithmic Lifting of the Causal Markov Condition
Algorithmic causality generalizes the classical Markov condition (which asserts that each variable is probabilistically independent of its nondescendants given its parents in a causal graph) by replacing statistical dependence with algorithmic mutual information. The central formulation is: where denotes Kolmogorov complexity and is the shortest (algorithmically optimal) description of the direct causes (parents) of . The local algorithmic Markov condition asserts that, for each node ,
meaning that once the optimal description of 's parents is known, the nondescendants () yield no further algorithmic compression of beyond a constant additive term. These conditions redefine "causal mechanism independence" as the absence of algorithmic redundancy between the mechanism describing a variable and its "context" (nondescendants) once the direct causes are specified (0804.3678).
2. Causal Discovery from Single Observations
Algorithmic causality enables causal inference in regimes where only single observations are available per variable, and the traditional assumption of i.i.d. sampling does not apply. Each observation is represented as a finite binary string, and candidate causal graphs are derived from the algorithmic mutual information structure among these strings. Causality is inferred if and only if certain shared descriptions (compressible commonalities) among the observations cannot be explained without postulating a direct causal connection or common ancestor. This framework is not grounded in frequency-based tests but in the structural sharing and independence of shortest descriptions—allowing for the generation of causal graphs in non-statistical contexts. This capability is crucial for applications such as the causal comparison of texts, genomic sequences, or individual observed instances (0804.3678).
3. Inference Rules, Conditional Densities, and Markov Equivalence
Building on the algorithmic Markov framework, causal inference proceeds by minimizing the algorithmic complexity of the conditional probability mechanisms (Markov kernels) appearing in candidate graph factorizations: The total description length is quantified as
Among statistically Markov equivalent graphs (those that encode identical sets of conditional independencies), the preferred causal direction is given by the model with the smallest total algorithmic complexity of its conditional mechanisms. This recasts Occam's razor in the setting of causal discovery: the simplest set of mechanisms, under algorithmic information, indicates the true directionality. Importantly, the complexity of conditionals themselves (not just marginal distributions) becomes informative; the model is preferred when the "mechanistic" conditionals admit succinct descriptions (e.g., unimodal, low-parameter) even if the effect's unconditional distribution is complex, such as multimodal mixtures (0804.3678).
4. Approximating Kolmogorov Complexity in Practice
Kolmogorov complexity is incomputable, so algorithmic causality requires practical and computable alternatives. Two main strategies are considered:
- Resource-bounded complexity measures or the use of actual data compressors (e.g., via the Minimum Description Length principle) as computable upper bounds for Kolmogorov complexity.
- Heuristic procedures, such as "blurring" or subsampling, allow for empirical testing of conditional complexity differences while remaining within the computable regime.
These proxies enable operationalization of the theory for applied causal inference and extend algorithmic causality's reach to real-world datasets where exact algorithmic measures are unavailable. For example, measures like MDL have established efficacy in model selection and can be used to implement algorithmically motivated causal inference heuristics (0804.3678).
5. Practical and Theoretical Implications
The theoretical reframing offered by algorithmic causality has several implications:
- Causal inference in non-i.i.d. and non-stationary settings: Algorithmic dependence does not presuppose repeated sampling and is therefore suited for unique or evolving systems.
- Distinction among Markov equivalent graphs: Algorithmic complexity breaks symmetry where statistical tests cannot—by preferring the simplest mechanisms among otherwise equivalent models.
- Sensitivity to mechanistic simplicity: Causal mechanisms are favored when they are compressible, i.e., require minimal programmatic description.
- Heuristic justification: The theory underpins previous heuristics such as the plausibility of simple Markov kernels, providing an information-theoretic foundation for preferring explanations that do not necessitate "coincidental" information sharing between mechanisms (formalizing a version of Occam's razor in causal modeling).
- Applicability to single-object causality: The framework generalizes to single observations, enabling causal reasoning in domains like text, images, or other complex, once-only data where statistical methods are fundamentally inapplicable.
Cases such as texts, unique genomes, or highly structured images benefit from this perspective, as causal judgments are grounded in compressibility and shared information rather than repeated sampling (0804.3678).
6. Emergent Causal Structures in Machine Learning
Algorithmic causality has implications for the emergence of causal structure in machine learning systems (such as LLMs) trained to compress data from multiple environments. Minimizing overall description length across heterogeneous data can induce the emergence of reusable, modular mechanisms: the model stores conditional rules (mechanisms) that are reused where possible, and this modular reuse is interpreted as the emergence of causality. When explicit intervention data are unavailable, the compression principle guides the learning of causal (or "quasi-causal") dependencies—as the representations that minimize the total coding length by factoring recurring regularities. Hence, compressive learning in large-scale models may foster internal representations with algorithmically causal structure, even in the absence of explicitly specified interventions or structural causal models (Wendong et al., 6 Feb 2025).
7. Synthesis and Outlook
Algorithmic causality anchors causal inference in the principles of algorithmic information theory, extending traditional statistical approaches to new domains. By substituting algorithmic independence (as measured by Kolmogorov complexity or its proxies) for probabilistic independence, and by preferring model structures that minimize both program and data description lengths, the approach unifies causal discovery and model simplicity in a principled fashion. This framework is robust to non-standard data regimes, supports causal discovery in individual objects, sharpened the selection among statistically equivalent models, and provides a foundational justification for established causal heuristics. As machine learning systems increasingly operate in complex, multi-environment, or intervention-scarce settings, algorithmic causality offers a theoretical and practical path for uncovering intrinsic causal structure via compression and descriptional simplicity (0804.3678, Wendong et al., 6 Feb 2025).