Papers
Topics
Authors
Recent
Search
2000 character limit reached

Additive Noise Models (ANMs)

Updated 9 April 2026
  • Additive Noise Models (ANMs) are causal discovery frameworks that represent each variable as a function of its parents plus an independent noise term, ensuring DAG identifiability.
  • They employ methods like regression with independence tests, score matching, and variance-based sorting to extract causal order from both bivariate and multivariate settings.
  • ANMs extend to challenging scenarios such as latent confounding, mechanism shifts, and missing data, with strong theoretical guarantees and empirical performance benchmarks.

Additive Noise Models (ANMs) provide a foundational framework for causal discovery in both bivariate and multivariate settings, grounding the inference of directed acyclic graph (DAG) structure in testable statistical asymmetries arising from additive noise representations of structural equations. The core assumption is that each observed variable is generated as a function of its parents plus a noise term that is statistically independent of its parents and mutually independent across nodes. Originating in work by Hoyer, Shimizu, Hyvärinen, Peters, and others, ANMs enable the identification of causal direction from observational data under generic nonlinearities and non-Gaussianity, with theoretical guarantees and scalable algorithms underpinning recent advances in causal inference, including structure learning in the presence of latent confounders, mixtures of mechanisms, arbitrary noise, and partial observability.

1. Structural Definition and Identifiability Results

A general ANM posits, for each node XjX_j in a pp-dimensional random vector X=(X1,...,Xp)X = (X_1, ..., X_p) and associated DAG GG,

Xj=fj(XPa(j))+Nj,X_j = f_j( X_{\operatorname{Pa}(j)} ) + N_j,

where Pa(j)\operatorname{Pa}(j) are the parents in GG, fjf_j is a differentiable, non-constant function, and NjN_j is a noise variable independent of XPa(j)X_{\operatorname{Pa}(j)} and of all other pp0 (pp1) (Peters et al., 2013, Dallakyan et al., 2024, Montagna et al., 2023). The key identifiability result is that for generic choices of pp2 and noise distributions, the true DAG pp3 is fully determined by the joint distribution pp4. Typical sufficiency conditions are: (i) nonlinear pp5, (ii) non-Gaussian pp6, or (iii) each parent-child triple pp7 avoids a pathological ODE (Peters et al., 2013). In the bivariate case, identifiability holds except for linear-Gaussian or degenerately symmetric settings. For multivariate models, a recursive sufficient condition requires that each conditional parent-child pair satisfies the bivariate identifiability criterion when conditioning on appropriate sets of nondescendants (Peters et al., 2013, Hiremath et al., 2024).

Special Cases and Extensions

  • Linear Non-Gaussian Models (LiNGAM): Linear pp8 with non-Gaussian pp9 (Dallakyan et al., 2024).
  • Majorization Approach: Conditional variances vector as a weak majorant identifies the topological order in linear SEMs, generalizing prior variance-based results (Dallakyan et al., 2024).
  • Mixtures of Mechanisms: When data arise from a mixture of ANMs indexed by latent discrete variable X=(X1,...,Xp)X = (X_1, ..., X_p)0, identifiability is retained under generic conditions via independence between input X=(X1,...,Xp)X = (X_1, ..., X_p)1 and the mechanism parameter (Hu et al., 2018).

2. Causal Discovery Algorithms

Regression with Subsequent Independence Test (RESIT)

RESIT is a two-phase algorithm. In phase 1, for each node, regress X=(X1,...,Xp)X = (X_1, ..., X_p)2 on its candidate parents and test for independence between residuals and regressors (e.g., via HSIC). The sink node (with minimal dependence) is recursively identified, and parents updated (Peters et al., 2013). Phase 2 prunes extraneous parents. RESIT achieves statistical consistency under exact independence testing and nonparametric regression oracles, though it may be sensitive to noise scaling and high-dimensional dependence testing (Peters et al., 2013, Kap, 2021).

  • Score Matching: Causal graphs can be identified by analyzing the score function X=(X1,...,Xp)X = (X_1, ..., X_p)3 and its Jacobian; leaf nodes are found when the variance of the corresponding diagonal entry is zero, enabling iterative order reconstruction (Rolland et al., 2022, Montagna et al., 2023, Chen et al., 2023).
  • NoGAM: Regresses empirical score estimates against regression residuals to identify leaves, without assuming Gaussianity, ensuring consistent recovery across arbitrary noise classes (Montagna et al., 2023).
  • SCORE: Computationally efficient kernel-based Stein estimators for score and score-Jacobian enable X=(X1,...,Xp)X = (X_1, ..., X_p)4 complexity algorithms that scale to large X=(X1,...,Xp)X = (X_1, ..., X_p)5, with rigorous guarantees (Rolland et al., 2022).
  • LoSAM: Leverages local independence and mutual information tests to establish roots and orderings, handling mixed linear/nonlinear mechanisms and minimizimg conditioning set sizes for efficiency (Hiremath et al., 2024).

Variance- and Information-Based Sorting

  • X=(X1,...,Xp)X = (X_1, ..., X_p)6-SortnRegress: Relies on the observation that the fraction of explained variance (X=(X1,...,Xp)X = (X_1, ..., X_p)7) often increases along the true causal order in sampled ANMs; sorting variables by X=(X1,...,Xp)X = (X_1, ..., X_p)8 yields approximately correct topological orderings under high X=(X1,...,Xp)X = (X_1, ..., X_p)9-sortability, which is robust to data standardization (Reisach et al., 2023).
  • Majorization Criterion: For linear SEMs, ordering variables so their conditional variance vector weakly majorizes that of other permutations uniquely identifies the causal ordering (Dallakyan et al., 2024).

Brute-force and greedy search strategies (e.g., GDS, LoSAM) optimize independence and/or variance-based scores over DAGs. Recent approaches achieve polynomial time with provable consistency and reduced sample complexity by exploiting local causal substructures and conditioning set minimization (Hiremath et al., 2024, Reisach et al., 2023).

3. Effects of Noise, Latent Structure, and Missing Data

Noise Level Sensitivity

ANM-based inference is robust only when the noise level in the effect is of comparable scale to the cause. For linear models, accurate causal direction is achievable when the noise-to-signal ratio GG0 is in GG1; outside this range both residual independence and variance-based methods break down (Kap, 2021, Kap et al., 2021). Nonlinear ANMs yield larger identifiable regimes, but practitioner guidance is to normalize variances, tune independence estimators, and combine strategies for robust inference.

Latent Confounding and Hidden Mediation

  • Confounders with Additive Noise (CAN): When both variables are nonlinear functions of a latent confounder plus mutually independent noise, identifiability is possible up to reparameterizations of the confounder, via moments inversion and independence constraints (Janzing et al., 2012). The ICAN algorithm alternates low-dimensional projection, independence minimization, and nonparametric regression; empirical results support model recovery under mild smoothness and independence conditions (Janzing et al., 2012).
  • Unobserved Mediators (ANM-UM and CNANM): The additive noise property is not preserved under marginalization over nonlinear mediator chains; standard ANM-based scoring and independence tests fail since conditional independence is lost in both directions (Meier et al., 29 Jun 2025, Cai et al., 2019). Variational autoencoder (VAE) approaches (CNANM), or novel conditional denoising/diffusion statistics (BiDD), restore identifiability where standard ANM methods collapse (Meier et al., 29 Jun 2025, Cai et al., 2019). BiDD achieves robust performance even with multiple nonlinear mediators by leveraging conditional denoising independence (Meier et al., 29 Jun 2025).

Missing Data

In the presence of ignorable missingness, the EM-based MissDAG framework leverages the invertibility of additive noise structure to perform likelihood maximization over the observed data and posterior-imputed missing entries, with joint DAG and function parameter optimization in the M-step (Gao et al., 2022). Classical identifiability results for ANMs carry over, as expected log-likelihoods are preserved, leading to empirically superior structure recovery compared to imputation-then-infer pipelines (Gao et al., 2022).

4. Model Variants: Mixtures, Mechanism Shifts, and Heterogeneity

  • Mixture of ANMs: Observational data generated by a finite mixture of ANMs indexed by latent variable GG2 are generically identifiable, as the existence of a mixture in both directions imposes highly restrictive ODE constraints on moments and densities (Hu et al., 2018). Gaussian Process Partially Observable Models (GPPOM) employ a latent-variable GP regression with HSIC independence penalty for each sample's mechanism parameter, enabling unsupervised causal inference and mechanism clustering with strong accuracy (Hu et al., 2018).
  • Causal Mechanism Shifts (iSCAN): In multi-environment ANMs differing only by soft (mechanism) interventions, the diagonal elements of the Hessian of the mixture score function (GG3) isolate shifted nodes by variance testing. iSCAN leverages this property for efficient detection and reconstruction of mechanism shifts without reconstructing full DAGs per environment (Chen et al., 2023).

5. Practicalities, Theoretical Guarantees, and Empirical Performance

Theoretical Guarantees

Empirical Benchmarks

Simulations confirm that ANM-based procedures outperform constraint- and score-based methods (PC, GES, FGES) when non-Gaussianity or nonlinearity is present. Mixture and shift-detection methods (GPPOM, iSCAN) show high ARI and FGG9 in synthetic and real-world heterogeneous datasets (Hu et al., 2018, Chen et al., 2023). Xj=fj(XPa(j))+Nj,X_j = f_j( X_{\operatorname{Pa}(j)} ) + N_j,0-SortnRegress achieves accuracy competitive with state-of-the-art on benchmark datasets when Xj=fj(XPa(j))+Nj,X_j = f_j( X_{\operatorname{Pa}(j)} ) + N_j,1 is high (Xj=fj(XPa(j))+Nj,X_j = f_j( X_{\operatorname{Pa}(j)} ) + N_j,20.8) (Reisach et al., 2023), while LoSAM achieves topological accuracy even under mixed mechanisms and reduced computational cost compared to NHTS or greedy order-search (Hiremath et al., 2024).

6. Limitations and Recent Directions

ANMs assume acyclicity, causal sufficiency, and correct specification of additive noise. Failures arise in linear-Gaussian non-identifiable cases, settings with extreme noise-level ratios, and under irreducible hidden mediation that is nonlinear (Kap et al., 2021, Meier et al., 29 Jun 2025, Cai et al., 2019). Recent developments address these restrictions via adaptive statistical testing, majorization-based ordering, latent-variable structure, and denoising/diffusion paradigms for direction-finding with latent mediation (Dallakyan et al., 2024, Meier et al., 29 Jun 2025). Robust extension to non-additive noise, feedback, or other classes of latent structure remains an open area (Chen et al., 2023, Jayanti, 14 Mar 2026).

7. Connections to Broader Causal Discovery and Outlook

ANMs have established themselves as a principal mechanism for observational causal discovery, subsuming and generalizing constraint- and score-based approaches by leveraging structural independence constraints and function–noise asymmetries. The proliferation of algorithms exploiting score-matching, majorization, regression-independence, and local substructure criteria reflects the centrality of ANMs in modern structure learning. Ongoing research aims to further extend the reach of ANMs to settings with confounding, missing data, dynamic environments, and high-dimensionality, while benchmarking beyond synthetic data remains crucial to assess Xj=fj(XPa(j))+Nj,X_j = f_j( X_{\operatorname{Pa}(j)} ) + N_j,3-sortability and identifiability in natural systems (Chen et al., 2023, Reisach et al., 2023, Gao et al., 2022).

Key sources: (Peters et al., 2013, Janzing et al., 2012, Montagna et al., 2023, Rolland et al., 2022, Hiremath et al., 2024, Dallakyan et al., 2024, Reisach et al., 2023, Kap, 2021, Kap et al., 2021, Hu et al., 2018, Chen et al., 2023, Meier et al., 29 Jun 2025, Cai et al., 2019, Gao et al., 2022, Elahi et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Additive Noise Models (ANMs).