GraphNOTEARS: Dynamic DAG Learning
- The paper introduces GraphNOTEARS, a method that extends acyclicity-constrained optimization to dynamic graphs using a Structural Vector Autoregression formulation.
- It employs an augmented Lagrangian approach with continuous optimization solvers to jointly estimate intra- and inter-slice dependencies with high precision.
- Empirical results on synthetic and real-world Yelp data demonstrate superior recovery, achieving near-perfect F1-scores and lower structural errors compared to baselines.
GraphNOTEARS is a methodology for learning the directed acyclic graph (DAG) structure underlying feature generation in dynamic graphs, where both the node attributes and the structural connectivity among nodes evolve over time. It extends prior acyclicity-constrained optimization approaches, such as NOTEARS and DYNOTEARS, to settings where dynamic graph data and lagged inter-temporal influences are fundamental. The core algorithm integrates a Structural Vector Autoregression (SVAR) formulation with a smooth acyclicity constraint, solved using an augmented Lagrangian objective and continuous optimization methods (Fan et al., 2022).
1. Problem Formulation and Statistical Model
GraphNOTEARS operates on dynamic graph data represented as
where each time step is described by (node-feature matrix for nodes and features) and (binary adjacency matrix). The model assumes data generation according to a stationary SVAR of order :
where:
- : the contemporaneous (intra-slice) effect matrix.
- : symmetric normalized adjacency at lag , including self-loops.
- : lag- (inter-slice) effect matrices.
- : row-wise independent centered noise.
By stacking the lagged ’s as and normalized adjacencies as , the model condenses to:
with and .
Contemporaneous edges are defined by , while inter-slice (lagged) edges are denoted for .
2. Structural Constraints and Objective Formulation
Only the intra-slice graph requires explicit acyclicity enforcement, since lagged effects are inherently time-directed and do not introduce directed cycles. The objective combines a least-squares data fit with sparsity-inducing penalties and acyclicity constraints:
Acyclicity for is imposed via the smooth constraint [NOTEARS formula]:
where denotes the Hadamard product and is the matrix exponential. This constraint ensures that all positive-length cycles are eliminated from .
Combining these, the augmented Lagrangian becomes:
The unconstrained minimization of allows application of standard continuous optimization algorithms.
3. Algorithmic Procedures and Optimization
The optimization alternates between two main steps:
- Minimization of : Jointly over using L-BFGS-B or Adam, with fixed penalty parameters.
- Dual Updates: The dual parameter and penalty weight are updated after each step, following Zheng et al. (2018): if , then (); .
Hyperparameters: (sparsity penalties, e.g., 0.01), initial (e.g., 1.0), , update factor (e.g., 10), tolerance (e.g., 0.25), hard-thresholds for adjacency support recovery (e.g., 0.3).
Stopping Criteria: Optimization halts when the gradient norm is small (e.g., ), (e.g., ), or a maximum iteration count is reached. Once stationary, entries less than or in magnitude are thresholded to zero, yielding binary estimates for intra- and inter-slice adjacencies .
4. Theoretical Guarantees and Properties
GraphNOTEARS inherits identifiability from the SVAR model under standard conditions, including non-Gaussianity or appropriate Gaussian structure. Consistency and high-probability recovery are shared with NOTEARS and DYNOTEARS in high-dimensional finite sample scenarios. The augmented Lagrangian approach with a smooth penalty converges to stationary points, with efficient practical reduction in both the objective and the acyclicity constraint term under L-BFGS-B.
Hyperparameter selection influences sparsity and acyclicity—larger or favors sparser structures (potentially discarding weak edges), while insufficiently large may yield cyclic solutions. Excessively large can hinder minimization of the objective .
5. Empirical Evaluation and Results
Experiments encompass both synthetic and real-world datasets.
Synthetic Data
- Graph Generation: Intra-slice graphs are drawn from Erdős–Rényi (ER) or Barabási–Albert (BA) models; inter-slice graphs from ER or Stochastic Block Models (SBM). generated i.i.d. Bernoulli(0.1).
- Edge Weights: Each edge weight is sampled uniformly from .
- Noise: is either Gaussian(0,1) or Exponential(1), i.i.d.
- Dimensions: , , time steps, lag .
Baselines
| Method | Description |
|---|---|
| NOTEARS+LASSO | (1) Estimate via classic NOTEARS, then (2) estimate by LASSO regression |
| DYNOTEARS | Jointly estimate , without accounting for time-varying graph structure |
All methods threshold final weights at and report mean 95% CI over 5 seeds.
Metrics
- -score: Harmonic mean of precision and recall on and .
- Structural Hamming Distance (SHD): Total number of edge additions, deletions, and reversals compared to ground truth.
Quantitative Results
In a representative setting (, Gaussian noise, ER/ER), GraphNOTEARS achieves perfect recovery of (, ) and substantially more accurate recovery of than baselines. Across all tested graph models, noise types, and hyperparameters, GraphNOTEARS outperforms NOTEARS+LASSO and DYNOTEARS, with performance degrading gracefully as increases or decreases, but consistently remaining superior.
Real-world Yelp Data
Two dynamic graph constructions from Yelp were considered:
- User Graph: Nodes are users; edges denote Yelp friend relationships. Node features represent average restaurant category, star rating, and total visits per user, .
- Business Graph: Nodes are restaurants; edges if they share category and spatial proximity; features aggregated from customer interactions.
In both graphs, the established ground-truth SCM for restaurants is . GraphNOTEARS recovers all three intra-slice causal edges exactly; DYNOTEARS misses one or two. Both methods find strong lagged (inter-slice) homophily, but only GraphNOTEARS uncovers the intra-slice relationships consistent with domain knowledge.
6. Limitations and Open Directions
The primary computational bottleneck is the per-iteration cost of evaluating the acyclicity constraint (), rendering practical for current implementations. Approximate or sparse matrix exponential schemes may alleviate this. The current approach presumes linear SEMs and stationarity; potential extensions to handle nonlinear relationships (e.g., via GNN-based functions) or time-varying , are highlighted avenues for future research. The final hard thresholding of and introduces a trade-off between false positives and negatives; principled threshold selection is an unresolved challenge.
7. Summary and Context
GraphNOTEARS generalizes acyclicity-constrained structure learning to dynamic graphs with evolving node features and topologies, simultaneously recovering contemporaneous and lagged dependencies by augmenting an SVAR structural framework with a smooth, differentiable acyclicity penalty. Empirical results on both synthetic and real-world data demonstrate state-of-the-art accuracy in reconstructing ground-truth causal structure relative to established baselines. Its continuous optimization formulation enables efficient application of mature unconstrained solvers, though scalability, nonlinearity, and thresholding remain as ongoing challenges (Fan et al., 2022).