Papers
Topics
Authors
Recent
2000 character limit reached

GraphNOTEARS: Dynamic DAG Learning

Updated 4 January 2026
  • The paper introduces GraphNOTEARS, a method that extends acyclicity-constrained optimization to dynamic graphs using a Structural Vector Autoregression formulation.
  • It employs an augmented Lagrangian approach with continuous optimization solvers to jointly estimate intra- and inter-slice dependencies with high precision.
  • Empirical results on synthetic and real-world Yelp data demonstrate superior recovery, achieving near-perfect F1-scores and lower structural errors compared to baselines.

GraphNOTEARS is a methodology for learning the directed acyclic graph (DAG) structure underlying feature generation in dynamic graphs, where both the node attributes and the structural connectivity among nodes evolve over time. It extends prior acyclicity-constrained optimization approaches, such as NOTEARS and DYNOTEARS, to settings where dynamic graph data and lagged inter-temporal influences are fundamental. The core algorithm integrates a Structural Vector Autoregression (SVAR) formulation with a smooth acyclicity constraint, solved using an augmented Lagrangian objective and continuous optimization methods (Fan et al., 2022).

1. Problem Formulation and Statistical Model

GraphNOTEARS operates on dynamic graph data represented as

G={(X(1),A(1)),,(X(T),A(T))},G = \{ (X^{(1)}, A^{(1)}), \ldots, (X^{(T)}, A^{(T)}) \},

where each time step t=1,,Tt=1,\ldots,T is described by X(t)Rn×dX^{(t)} \in \mathbb{R}^{n \times d} (node-feature matrix for nn nodes and dd features) and A(t){0,1}n×nA^{(t)} \in \{0,1\}^{n \times n} (binary adjacency matrix). The model assumes data generation according to a stationary SVAR of order pp:

X(t)=X(t)W+=1pAˉ(t)X(t)P()+Z(t),X^{(t)} = X^{(t)} W + \sum_{\ell=1}^p \bar{A}^{(t-\ell)} X^{(t-\ell)} P^{(\ell)} + Z^{(t)},

where:

  • WRd×dW \in \mathbb{R}^{d \times d}: the contemporaneous (intra-slice) effect matrix.
  • Aˉ(t)=D1/2(A(t)+I)D1/2\bar{A}^{(t-\ell)} = D^{-1/2}(A^{(t-\ell)} + I)D^{-1/2}: symmetric normalized adjacency at lag \ell, including self-loops.
  • P()Rd×dP^{(\ell)} \in \mathbb{R}^{d \times d}: lag-\ell (inter-slice) effect matrices.
  • Z(t)Rn×dZ^{(t)} \in \mathbb{R}^{n \times d}: row-wise independent centered noise.

By stacking the pp lagged XX’s as M=[X(t1)X(tp)]Rn×pdM = [X^{(t-1)}|\ldots|X^{(t-p)}] \in \mathbb{R}^{n \times pd} and normalized adjacencies as Aˉ=[Aˉ(t1)Aˉ(tp)]Rn×pn\bar{A} = [\bar{A}^{(t-1)}|\ldots|\bar{A}^{(t-p)}] \in \mathbb{R}^{n \times pn}, the model condenses to:

X=XW+(AˉM)P+Z,X = X W + (\bar{A} \otimes M) P + Z,

with P=[P(1);;P(p)]Rpd×dP = [P^{(1)};\ldots;P^{(p)}] \in \mathbb{R}^{pd \times d} and (AˉM)=[Aˉ(t1)X(t1)Aˉ(tp)X(tp)](\bar{A} \otimes M) = [\bar{A}^{(t-1)}X^{(t-1)} | \ldots | \bar{A}^{(t-p)} X^{(t-p)}].

Contemporaneous edges are defined by W(0):=WW^{(0)}:=W, while inter-slice (lagged) edges are denoted W():=P()W^{(\ell)}:=P^{(\ell)} for =1,,p\ell=1,\ldots,p.

2. Structural Constraints and Objective Formulation

Only the intra-slice graph WW requires explicit acyclicity enforcement, since lagged effects are inherently time-directed and do not introduce directed cycles. The objective combines a least-squares data fit with sparsity-inducing 1\ell_1 penalties and acyclicity constraints:

S(W,P)=12nXXW(AˉM)PF2+λWW1+λPP1S(W, P) = \frac{1}{2n} \lVert X - X W - (\bar{A}\otimes M) P \rVert_F^2 + \lambda_W \lVert W \rVert_1 + \lambda_P \lVert P \rVert_1

Acyclicity for WW is imposed via the smooth constraint [NOTEARS formula]:

h(W):=tr(eWW)d=0,h(W) := \operatorname{tr}(e^{W \circ W}) - d = 0,

where \circ denotes the Hadamard product and ee^{\cdot} is the matrix exponential. This constraint ensures that all positive-length cycles are eliminated from WW.

Combining these, the augmented Lagrangian becomes:

L(W,P;α,ρ)=S(W,P)+αh(W)+ρ2[h(W)]2L(W, P;\alpha, \rho) = S(W, P) + \alpha h(W) + \frac{\rho}{2}[h(W)]^2

The unconstrained minimization of LL allows application of standard continuous optimization algorithms.

3. Algorithmic Procedures and Optimization

The optimization alternates between two main steps:

  1. Minimization of L(W,P;α,ρ)L(W, P; \alpha, \rho): Jointly over W,PW,P using L-BFGS-B or Adam, with fixed penalty parameters.
  2. Dual Updates: The dual parameter α\alpha and penalty weight ρ\rho are updated after each step, following Zheng et al. (2018): if h(W)>γh(W)prevh(W) > \gamma h(W)_{\text{prev}}, then ρτρ\rho \leftarrow \tau \cdot \rho (τ>1\tau > 1); αα+ρh(W)\alpha \leftarrow \alpha + \rho h(W).

Hyperparameters: λW,λP\lambda_W, \lambda_P (sparsity penalties, e.g., 0.01), initial ρ>0\rho > 0 (e.g., 1.0), α=0\alpha=0, update factor τ\tau (e.g., 10), tolerance γ\gamma (e.g., 0.25), hard-thresholds for adjacency support recovery τW,τP\tau_W, \tau_P (e.g., 0.3).

Stopping Criteria: Optimization halts when the gradient norm is small (e.g., 10810^{-8}), h(W)<εh|h(W)| < \varepsilon_h (e.g., 10810^{-8}), or a maximum iteration count is reached. Once stationary, entries less than τW\tau_W or τP\tau_P in magnitude are thresholded to zero, yielding binary estimates for intra- and inter-slice adjacencies W^,P^\hat{W}, \hat{P}.

4. Theoretical Guarantees and Properties

GraphNOTEARS inherits identifiability from the SVAR model under standard conditions, including non-Gaussianity or appropriate Gaussian structure. Consistency and high-probability recovery are shared with NOTEARS and DYNOTEARS in high-dimensional finite sample scenarios. The augmented Lagrangian approach with a smooth penalty converges to stationary points, with efficient practical reduction in both the objective and the acyclicity constraint term under L-BFGS-B.

Hyperparameter selection influences sparsity and acyclicity—larger λW\lambda_W or λP\lambda_P favors sparser structures (potentially discarding weak edges), while insufficiently large ρ\rho may yield cyclic solutions. Excessively large ρ\rho can hinder minimization of the objective S(W,P)S(W, P).

5. Empirical Evaluation and Results

Experiments encompass both synthetic and real-world datasets.

Synthetic Data

  • Graph Generation: Intra-slice graphs WW are drawn from Erdős–Rényi (ER) or Barabási–Albert (BA) models; inter-slice graphs PP from ER or Stochastic Block Models (SBM). A(t)A^{(t)} generated i.i.d. Bernoulli(0.1).
  • Edge Weights: Each edge weight is sampled uniformly from [0.5,2][2,0.5][0.5, 2] \cup [-2, -0.5].
  • Noise: ZZ is either Gaussian(0,1) or Exponential(1), i.i.d.
  • Dimensions: n{100,200,500}n \in \{100, 200, 500\}, d{5,10,20,30}d \in \{5, 10, 20, 30\}, T=7T=7 time steps, lag p{1,2}p\in\{1,2\}.

Baselines

Method Description
NOTEARS+LASSO (1) Estimate WW via classic NOTEARS, then (2) estimate PP by LASSO regression
DYNOTEARS Jointly estimate WW, PP without accounting for time-varying graph structure Aˉ\bar{A}

All methods threshold final weights at τ=0.3\tau=0.3 and report mean ±\pm 95% CI over 5 seeds.

Metrics

  • F1F_1-score: Harmonic mean of precision and recall on W^\hat{W} and P^\hat{P}.
  • Structural Hamming Distance (SHD): Total number of edge additions, deletions, and reversals compared to ground truth.

Quantitative Results

In a representative setting (n=500,d=5,p=2n=500, d=5, p=2, Gaussian noise, ER/ER), GraphNOTEARS achieves perfect recovery of WW (F11.0F_1 \approx 1.0, SHD0\text{SHD} \approx 0) and substantially more accurate recovery of P(1),P(2)P^{(1)}, P^{(2)} than baselines. Across all tested graph models, noise types, and hyperparameters, GraphNOTEARS outperforms NOTEARS+LASSO and DYNOTEARS, with performance degrading gracefully as dd increases or nn decreases, but consistently remaining superior.

Real-world Yelp Data

Two dynamic graph constructions from Yelp were considered:

  • User Graph: Nodes are users; edges denote Yelp friend relationships. Node features (C,S,F)(C, S, F) represent average restaurant category, star rating, and total visits per user, p=1p=1.
  • Business Graph: Nodes are restaurants; edges if they share category and spatial proximity; features aggregated from customer interactions.

In both graphs, the established ground-truth SCM for restaurants is CS,CF,SFC \to S, C \to F, S \to F. GraphNOTEARS recovers all three intra-slice causal edges exactly; DYNOTEARS misses one or two. Both methods find strong lagged (inter-slice) homophily, but only GraphNOTEARS uncovers the intra-slice relationships consistent with domain knowledge.

6. Limitations and Open Directions

The primary computational bottleneck is the O(d3)O(d^3) per-iteration cost of evaluating the acyclicity constraint (tr(eWW)\operatorname{tr}(e^{W\circ W})), rendering d1000d \lesssim 1000 practical for current implementations. Approximate or sparse matrix exponential schemes may alleviate this. The current approach presumes linear SEMs and stationarity; potential extensions to handle nonlinear relationships (e.g., via GNN-based functions) or time-varying WW, PP are highlighted avenues for future research. The final hard thresholding of W^\hat{W} and P^\hat{P} introduces a trade-off between false positives and negatives; principled threshold selection is an unresolved challenge.

7. Summary and Context

GraphNOTEARS generalizes acyclicity-constrained structure learning to dynamic graphs with evolving node features and topologies, simultaneously recovering contemporaneous and lagged dependencies by augmenting an SVAR structural framework with a smooth, differentiable acyclicity penalty. Empirical results on both synthetic and real-world data demonstrate state-of-the-art accuracy in reconstructing ground-truth causal structure relative to established baselines. Its continuous optimization formulation enables efficient application of mature unconstrained solvers, though scalability, nonlinearity, and thresholding remain as ongoing challenges (Fan et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to GraphNOTEARS.