GraphNOTEARS: Dynamic DAG Learning

Updated 4 January 2026

The paper introduces GraphNOTEARS, a method that extends acyclicity-constrained optimization to dynamic graphs using a Structural Vector Autoregression formulation.
It employs an augmented Lagrangian approach with continuous optimization solvers to jointly estimate intra- and inter-slice dependencies with high precision.
Empirical results on synthetic and real-world Yelp data demonstrate superior recovery, achieving near-perfect F1-scores and lower structural errors compared to baselines.

GraphNOTEARS is a methodology for learning the directed acyclic graph (DAG) structure underlying feature generation in dynamic graphs, where both the node attributes and the structural connectivity among nodes evolve over time. It extends prior acyclicity-constrained optimization approaches, such as NOTEARS and DYNOTEARS, to settings where dynamic graph data and lagged inter-temporal influences are fundamental. The core algorithm integrates a Structural Vector Autoregression (SVAR) formulation with a smooth acyclicity constraint, solved using an augmented Lagrangian objective and continuous optimization methods (Fan et al., 2022).

1. Problem Formulation and Statistical Model

GraphNOTEARS operates on dynamic graph data represented as

$G = \{ (X^{(1)}, A^{(1)}), \ldots, (X^{(T)}, A^{(T)}) \},$

where each time step $t=1,\ldots,T$ is described by $X^{(t)} \in \mathbb{R}^{n \times d}$ (node-feature matrix for $n$ nodes and $d$ features) and $A^{(t)} \in \{0,1\}^{n \times n}$ (binary adjacency matrix). The model assumes data generation according to a stationary SVAR of order $p$ :

$X^{(t)} = X^{(t)} W + \sum_{\ell=1}^p \bar{A}^{(t-\ell)} X^{(t-\ell)} P^{(\ell)} + Z^{(t)},$

where:

$W \in \mathbb{R}^{d \times d}$ : the contemporaneous (intra-slice) effect matrix.
$\bar{A}^{(t-\ell)} = D^{-1/2}(A^{(t-\ell)} + I)D^{-1/2}$ : symmetric normalized adjacency at lag $\ell$ , including self-loops.
$P^{(\ell)} \in \mathbb{R}^{d \times d}$ : lag- $\ell$ (inter-slice) effect matrices.
$Z^{(t)} \in \mathbb{R}^{n \times d}$ : row-wise independent centered noise.

By stacking the $p$ lagged $X$ ’s as $M = [X^{(t-1)}|\ldots|X^{(t-p)}] \in \mathbb{R}^{n \times pd}$ and normalized adjacencies as $\bar{A} = [\bar{A}^{(t-1)}|\ldots|\bar{A}^{(t-p)}] \in \mathbb{R}^{n \times pn}$ , the model condenses to:

$X = X W + (\bar{A} \otimes M) P + Z,$

with $P = [P^{(1)};\ldots;P^{(p)}] \in \mathbb{R}^{pd \times d}$ and $(\bar{A} \otimes M) = [\bar{A}^{(t-1)}X^{(t-1)} | \ldots | \bar{A}^{(t-p)} X^{(t-p)}]$ .

Contemporaneous edges are defined by $W^{(0)}:=W$ , while inter-slice (lagged) edges are denoted $W^{(\ell)}:=P^{(\ell)}$ for $\ell=1,\ldots,p$ .

2. Structural Constraints and Objective Formulation

Only the intra-slice graph $W$ requires explicit acyclicity enforcement, since lagged effects are inherently time-directed and do not introduce directed cycles. The objective combines a least-squares data fit with sparsity-inducing $\ell_1$ penalties and acyclicity constraints:

$S(W, P) = \frac{1}{2n} \lVert X - X W - (\bar{A}\otimes M) P \rVert_F^2 + \lambda_W \lVert W \rVert_1 + \lambda_P \lVert P \rVert_1$

Acyclicity for $W$ is imposed via the smooth constraint [NOTEARS formula]:

$h(W) := \operatorname{tr}(e^{W \circ W}) - d = 0,$

where $\circ$ denotes the Hadamard product and $e^{\cdot}$ is the matrix exponential. This constraint ensures that all positive-length cycles are eliminated from $W$ .

Combining these, the augmented Lagrangian becomes:

$L(W, P;\alpha, \rho) = S(W, P) + \alpha h(W) + \frac{\rho}{2}[h(W)]^2$

The unconstrained minimization of $L$ allows application of standard continuous optimization algorithms.

3. Algorithmic Procedures and Optimization

The optimization alternates between two main steps:

Minimization of $L(W, P; \alpha, \rho)$ : Jointly over $W,P$ using L-BFGS-B or Adam, with fixed penalty parameters.
Dual Updates: The dual parameter $\alpha$ and penalty weight $\rho$ are updated after each step, following Zheng et al. (2018): if $h(W) > \gamma h(W)_{\text{prev}}$ , then $\rho \leftarrow \tau \cdot \rho$ ( $\tau > 1$ ); $\alpha \leftarrow \alpha + \rho h(W)$ .

Hyperparameters: $\lambda_W, \lambda_P$ (sparsity penalties, e.g., 0.01), initial $\rho > 0$ (e.g., 1.0), $\alpha=0$ , update factor $\tau$ (e.g., 10), tolerance $\gamma$ (e.g., 0.25), hard-thresholds for adjacency support recovery $\tau_W, \tau_P$ (e.g., 0.3).

Stopping Criteria: Optimization halts when the gradient norm is small (e.g., $10^{-8}$ ), $|h(W)| < \varepsilon_h$ (e.g., $10^{-8}$ ), or a maximum iteration count is reached. Once stationary, entries less than $\tau_W$ or $\tau_P$ in magnitude are thresholded to zero, yielding binary estimates for intra- and inter-slice adjacencies $\hat{W}, \hat{P}$ .

4. Theoretical Guarantees and Properties

GraphNOTEARS inherits identifiability from the SVAR model under standard conditions, including non-Gaussianity or appropriate Gaussian structure. Consistency and high-probability recovery are shared with NOTEARS and DYNOTEARS in high-dimensional finite sample scenarios. The augmented Lagrangian approach with a smooth penalty converges to stationary points, with efficient practical reduction in both the objective and the acyclicity constraint term under L-BFGS-B.

Hyperparameter selection influences sparsity and acyclicity—larger $\lambda_W$ or $\lambda_P$ favors sparser structures (potentially discarding weak edges), while insufficiently large $\rho$ may yield cyclic solutions. Excessively large $\rho$ can hinder minimization of the objective $S(W, P)$ .

5. Empirical Evaluation and Results

Experiments encompass both synthetic and real-world datasets.

Synthetic Data

Graph Generation: Intra-slice graphs $W$ are drawn from Erdős–Rényi (ER) or Barabási–Albert (BA) models; inter-slice graphs $P$ from ER or Stochastic Block Models (SBM). $A^{(t)}$ generated i.i.d. Bernoulli(0.1).
Edge Weights: Each edge weight is sampled uniformly from $[0.5, 2] \cup [-2, -0.5]$ .
Noise: $Z$ is either Gaussian(0,1) or Exponential(1), i.i.d.
Dimensions: $n \in \{100, 200, 500\}$ , $d \in \{5, 10, 20, 30\}$ , $T=7$ time steps, lag $p\in\{1,2\}$ .

Baselines

Method	Description
NOTEARS+LASSO	(1) Estimate $W$ via classic NOTEARS, then (2) estimate $P$ by LASSO regression
DYNOTEARS	Jointly estimate $W$ , $P$ without accounting for time-varying graph structure $\bar{A}$

All methods threshold final weights at $\tau=0.3$ and report mean $\pm$ 95% CI over 5 seeds.

Metrics

$F_1$ -score: Harmonic mean of precision and recall on $\hat{W}$ and $\hat{P}$ .
Structural Hamming Distance (SHD): Total number of edge additions, deletions, and reversals compared to ground truth.

Quantitative Results

In a representative setting ( $n=500, d=5, p=2$ , Gaussian noise, ER/ER), GraphNOTEARS achieves perfect recovery of $W$ ( $F_1 \approx 1.0$ , $\text{SHD} \approx 0$ ) and substantially more accurate recovery of $P^{(1)}, P^{(2)}$ than baselines. Across all tested graph models, noise types, and hyperparameters, GraphNOTEARS outperforms NOTEARS+LASSO and DYNOTEARS, with performance degrading gracefully as $d$ increases or $n$ decreases, but consistently remaining superior.

Real-world Yelp Data

Two dynamic graph constructions from Yelp were considered:

User Graph: Nodes are users; edges denote Yelp friend relationships. Node features $(C, S, F)$ represent average restaurant category, star rating, and total visits per user, $p=1$ .
Business Graph: Nodes are restaurants; edges if they share category and spatial proximity; features aggregated from customer interactions.

In both graphs, the established ground-truth SCM for restaurants is $C \to S, C \to F, S \to F$ . GraphNOTEARS recovers all three intra-slice causal edges exactly; DYNOTEARS misses one or two. Both methods find strong lagged (inter-slice) homophily, but only GraphNOTEARS uncovers the intra-slice relationships consistent with domain knowledge.

6. Limitations and Open Directions

The primary computational bottleneck is the $O(d^3)$ per-iteration cost of evaluating the acyclicity constraint ( $\operatorname{tr}(e^{W\circ W})$ ), rendering $d \lesssim 1000$ practical for current implementations. Approximate or sparse matrix exponential schemes may alleviate this. The current approach presumes linear SEMs and stationarity; potential extensions to handle nonlinear relationships (e.g., via GNN-based functions) or time-varying $W$ , $P$ are highlighted avenues for future research. The final hard thresholding of $\hat{W}$ and $\hat{P}$ introduces a trade-off between false positives and negatives; principled threshold selection is an unresolved challenge.

7. Summary and Context

GraphNOTEARS generalizes acyclicity-constrained structure learning to dynamic graphs with evolving node features and topologies, simultaneously recovering contemporaneous and lagged dependencies by augmenting an SVAR structural framework with a smooth, differentiable acyclicity penalty. Empirical results on both synthetic and real-world data demonstrate state-of-the-art accuracy in reconstructing ground-truth causal structure relative to established baselines. Its continuous optimization formulation enables efficient application of mature unconstrained solvers, though scalability, nonlinearity, and thresholding remain as ongoing challenges (Fan et al., 2022).

PDF Markdown Chat (Pro)

References (1)

Directed Acyclic Graph Structure Learning from Dynamic Graphs (2022)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to GraphNOTEARS.

GraphNOTEARS: Dynamic DAG Learning

1. Problem Formulation and Statistical Model

2. Structural Constraints and Objective Formulation

3. Algorithmic Procedures and Optimization

4. Theoretical Guarantees and Properties

5. Empirical Evaluation and Results

Synthetic Data

Baselines

Metrics

Quantitative Results

Real-world Yelp Data

6. Limitations and Open Directions

7. Summary and Context

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

GraphNOTEARS: Dynamic DAG Learning

1. Problem Formulation and Statistical Model

2. Structural Constraints and Objective Formulation

3. Algorithmic Procedures and Optimization

4. Theoretical Guarantees and Properties

5. Empirical Evaluation and Results

Synthetic Data

Baselines

Metrics

Quantitative Results

Real-world Yelp Data

6. Limitations and Open Directions

7. Summary and Context

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research