Additive Noise Models (ANMs)

Updated 9 April 2026

Additive Noise Models (ANMs) are causal discovery frameworks that represent each variable as a function of its parents plus an independent noise term, ensuring DAG identifiability.
They employ methods like regression with independence tests, score matching, and variance-based sorting to extract causal order from both bivariate and multivariate settings.
ANMs extend to challenging scenarios such as latent confounding, mechanism shifts, and missing data, with strong theoretical guarantees and empirical performance benchmarks.

Additive Noise Models (ANMs) provide a foundational framework for causal discovery in both bivariate and multivariate settings, grounding the inference of directed acyclic graph (DAG) structure in testable statistical asymmetries arising from additive noise representations of structural equations. The core assumption is that each observed variable is generated as a function of its parents plus a noise term that is statistically independent of its parents and mutually independent across nodes. Originating in work by Hoyer, Shimizu, Hyvärinen, Peters, and others, ANMs enable the identification of causal direction from observational data under generic nonlinearities and non-Gaussianity, with theoretical guarantees and scalable algorithms underpinning recent advances in causal inference, including structure learning in the presence of latent confounders, mixtures of mechanisms, arbitrary noise, and partial observability.

1. Structural Definition and Identifiability Results

A general ANM posits, for each node $X_j$ in a $p$ -dimensional random vector $X = (X_1, ..., X_p)$ and associated DAG $G$ ,

$X_j = f_j( X_{\operatorname{Pa}(j)} ) + N_j,$

where $\operatorname{Pa}(j)$ are the parents in $G$ , $f_j$ is a differentiable, non-constant function, and $N_j$ is a noise variable independent of $X_{\operatorname{Pa}(j)}$ and of all other $p$ 0 ( $p$ 1) (Peters et al., 2013, Dallakyan et al., 2024, Montagna et al., 2023). The key identifiability result is that for generic choices of $p$ 2 and noise distributions, the true DAG $p$ 3 is fully determined by the joint distribution $p$ 4. Typical sufficiency conditions are: (i) nonlinear $p$ 5, (ii) non-Gaussian $p$ 6, or (iii) each parent-child triple $p$ 7 avoids a pathological ODE (Peters et al., 2013). In the bivariate case, identifiability holds except for linear-Gaussian or degenerately symmetric settings. For multivariate models, a recursive sufficient condition requires that each conditional parent-child pair satisfies the bivariate identifiability criterion when conditioning on appropriate sets of nondescendants (Peters et al., 2013, Hiremath et al., 2024).

Special Cases and Extensions

Linear Non-Gaussian Models (LiNGAM): Linear $p$ 8 with non-Gaussian $p$ 9 (Dallakyan et al., 2024).
Majorization Approach: Conditional variances vector as a weak majorant identifies the topological order in linear SEMs, generalizing prior variance-based results (Dallakyan et al., 2024).
Mixtures of Mechanisms: When data arise from a mixture of ANMs indexed by latent discrete variable $X = (X_1, ..., X_p)$ 0, identifiability is retained under generic conditions via independence between input $X = (X_1, ..., X_p)$ 1 and the mechanism parameter (Hu et al., 2018).

2. Causal Discovery Algorithms

Regression with Subsequent Independence Test (RESIT)

RESIT is a two-phase algorithm. In phase 1, for each node, regress $X = (X_1, ..., X_p)$ 2 on its candidate parents and test for independence between residuals and regressors (e.g., via HSIC). The sink node (with minimal dependence) is recursively identified, and parents updated (Peters et al., 2013). Phase 2 prunes extraneous parents. RESIT achieves statistical consistency under exact independence testing and nonparametric regression oracles, though it may be sensitive to noise scaling and high-dimensional dependence testing (Peters et al., 2013, Kap, 2021).

Score-Matching and Order Search

Score Matching: Causal graphs can be identified by analyzing the score function $X = (X_1, ..., X_p)$ 3 and its Jacobian; leaf nodes are found when the variance of the corresponding diagonal entry is zero, enabling iterative order reconstruction (Rolland et al., 2022, Montagna et al., 2023, Chen et al., 2023).
NoGAM: Regresses empirical score estimates against regression residuals to identify leaves, without assuming Gaussianity, ensuring consistent recovery across arbitrary noise classes (Montagna et al., 2023).
SCORE: Computationally efficient kernel-based Stein estimators for score and score-Jacobian enable $X = (X_1, ..., X_p)$ 4 complexity algorithms that scale to large $X = (X_1, ..., X_p)$ 5, with rigorous guarantees (Rolland et al., 2022).
LoSAM: Leverages local independence and mutual information tests to establish roots and orderings, handling mixed linear/nonlinear mechanisms and minimizimg conditioning set sizes for efficiency (Hiremath et al., 2024).

Variance- and Information-Based Sorting

$X = (X_1, ..., X_p)$ 6-SortnRegress: Relies on the observation that the fraction of explained variance ( $X = (X_1, ..., X_p)$ 7) often increases along the true causal order in sampled ANMs; sorting variables by $X = (X_1, ..., X_p)$ 8 yields approximately correct topological orderings under high $X = (X_1, ..., X_p)$ 9-sortability, which is robust to data standardization (Reisach et al., 2023).
Majorization Criterion: For linear SEMs, ordering variables so their conditional variance vector weakly majorizes that of other permutations uniquely identifies the causal ordering (Dallakyan et al., 2024).

Global and Local Search

Brute-force and greedy search strategies (e.g., GDS, LoSAM) optimize independence and/or variance-based scores over DAGs. Recent approaches achieve polynomial time with provable consistency and reduced sample complexity by exploiting local causal substructures and conditioning set minimization (Hiremath et al., 2024, Reisach et al., 2023).

3. Effects of Noise, Latent Structure, and Missing Data

Noise Level Sensitivity

ANM-based inference is robust only when the noise level in the effect is of comparable scale to the cause. For linear models, accurate causal direction is achievable when the noise-to-signal ratio $G$ 0 is in $G$ 1; outside this range both residual independence and variance-based methods break down (Kap, 2021, Kap et al., 2021). Nonlinear ANMs yield larger identifiable regimes, but practitioner guidance is to normalize variances, tune independence estimators, and combine strategies for robust inference.

Latent Confounding and Hidden Mediation

Confounders with Additive Noise (CAN): When both variables are nonlinear functions of a latent confounder plus mutually independent noise, identifiability is possible up to reparameterizations of the confounder, via moments inversion and independence constraints (Janzing et al., 2012). The ICAN algorithm alternates low-dimensional projection, independence minimization, and nonparametric regression; empirical results support model recovery under mild smoothness and independence conditions (Janzing et al., 2012).
Unobserved Mediators (ANM-UM and CNANM): The additive noise property is not preserved under marginalization over nonlinear mediator chains; standard ANM-based scoring and independence tests fail since conditional independence is lost in both directions (Meier et al., 29 Jun 2025, Cai et al., 2019). Variational autoencoder (VAE) approaches (CNANM), or novel conditional denoising/diffusion statistics (BiDD), restore identifiability where standard ANM methods collapse (Meier et al., 29 Jun 2025, Cai et al., 2019). BiDD achieves robust performance even with multiple nonlinear mediators by leveraging conditional denoising independence (Meier et al., 29 Jun 2025).

Missing Data

In the presence of ignorable missingness, the EM-based MissDAG framework leverages the invertibility of additive noise structure to perform likelihood maximization over the observed data and posterior-imputed missing entries, with joint DAG and function parameter optimization in the M-step (Gao et al., 2022). Classical identifiability results for ANMs carry over, as expected log-likelihoods are preserved, leading to empirically superior structure recovery compared to imputation-then-infer pipelines (Gao et al., 2022).

4. Model Variants: Mixtures, Mechanism Shifts, and Heterogeneity

Mixture of ANMs: Observational data generated by a finite mixture of ANMs indexed by latent variable $G$ 2 are generically identifiable, as the existence of a mixture in both directions imposes highly restrictive ODE constraints on moments and densities (Hu et al., 2018). Gaussian Process Partially Observable Models (GPPOM) employ a latent-variable GP regression with HSIC independence penalty for each sample's mechanism parameter, enabling unsupervised causal inference and mechanism clustering with strong accuracy (Hu et al., 2018).
Causal Mechanism Shifts (iSCAN): In multi-environment ANMs differing only by soft (mechanism) interventions, the diagonal elements of the Hessian of the mixture score function ( $G$ 3) isolate shifted nodes by variance testing. iSCAN leverages this property for efficient detection and reconstruction of mechanism shifts without reconstructing full DAGs per environment (Chen et al., 2023).

5. Practicalities, Theoretical Guarantees, and Empirical Performance

Theoretical Guarantees

Identifiability: For nonlinear ANMs with independent noise and generic $G$ 4, identifiability of the full DAG from observational distribution is established (Peters et al., 2013, Montagna et al., 2023, Hiremath et al., 2024).
Robustness to Noise Distribution: Algorithms such as NoGAM and LoSAM are consistent without Gaussianity, correcting failures in Gaussian-specific scoring (Montagna et al., 2023, Hiremath et al., 2024).
Polynomial-Time Recovery: Score-matching–based and local-search methods can achieve $G$ 5 or $G$ 6 time with sample efficiency scaling polynomially in $G$ 7 (Hiremath et al., 2024, Rolland et al., 2022, Montagna et al., 2023). Majorization and $G$ 8-sorting approaches also admit similar computational complexity (Dallakyan et al., 2024, Reisach et al., 2023).

Empirical Benchmarks

Simulations confirm that ANM-based procedures outperform constraint- and score-based methods (PC, GES, FGES) when non-Gaussianity or nonlinearity is present. Mixture and shift-detection methods (GPPOM, iSCAN) show high ARI and F $G$ 9 in synthetic and real-world heterogeneous datasets (Hu et al., 2018, Chen et al., 2023). $X_j = f_j( X_{\operatorname{Pa}(j)} ) + N_j,$ 0-SortnRegress achieves accuracy competitive with state-of-the-art on benchmark datasets when $X_j = f_j( X_{\operatorname{Pa}(j)} ) + N_j,$ 1 is high ( $X_j = f_j( X_{\operatorname{Pa}(j)} ) + N_j,$ 20.8) (Reisach et al., 2023), while LoSAM achieves topological accuracy even under mixed mechanisms and reduced computational cost compared to NHTS or greedy order-search (Hiremath et al., 2024).

6. Limitations and Recent Directions

ANMs assume acyclicity, causal sufficiency, and correct specification of additive noise. Failures arise in linear-Gaussian non-identifiable cases, settings with extreme noise-level ratios, and under irreducible hidden mediation that is nonlinear (Kap et al., 2021, Meier et al., 29 Jun 2025, Cai et al., 2019). Recent developments address these restrictions via adaptive statistical testing, majorization-based ordering, latent-variable structure, and denoising/diffusion paradigms for direction-finding with latent mediation (Dallakyan et al., 2024, Meier et al., 29 Jun 2025). Robust extension to non-additive noise, feedback, or other classes of latent structure remains an open area (Chen et al., 2023, Jayanti, 14 Mar 2026).

7. Connections to Broader Causal Discovery and Outlook

ANMs have established themselves as a principal mechanism for observational causal discovery, subsuming and generalizing constraint- and score-based approaches by leveraging structural independence constraints and function–noise asymmetries. The proliferation of algorithms exploiting score-matching, majorization, regression-independence, and local substructure criteria reflects the centrality of ANMs in modern structure learning. Ongoing research aims to further extend the reach of ANMs to settings with confounding, missing data, dynamic environments, and high-dimensionality, while benchmarking beyond synthetic data remains crucial to assess $X_j = f_j( X_{\operatorname{Pa}(j)} ) + N_j,$ 3-sortability and identifiability in natural systems (Chen et al., 2023, Reisach et al., 2023, Gao et al., 2022).