MS-CASTLE & MN-CASTLE: Multiscale Causal Inference
- The paper introduces MS-CASTLE and MN-CASTLE frameworks that leverage multiresolution wavelet analysis, sparse structural equation modeling, and Bayesian nonstationary graph learning to infer dynamic causal relationships.
- The methodology combines wavelet-based multiscale decomposition with block-diagonal SVARM and acyclicity constraints to recover temporal DAGs while efficiently handling noise and nonstationarity.
- Empirical results in domains like finance and neuroscience show these frameworks outperform traditional methods by revealing time-varying, scale-specific causal influences with higher accuracy and robustness.
Multiscale structure learning in time series addresses the challenge of inferring latent dynamic dependencies, including causal structure, in high-dimensional or multivariate time series where interactions may vary over both multiple time scales and time itself. The MS-CASTLE (Multiscale-Causal Structure Learning) and MN-CASTLE (Multiscale Non-stationary Causal Structure Learner) frameworks represent state-of-the-art approaches for modeling, estimating, and interpreting such complex causal and statistical interactions. These methods integrate wavelet-based multiresolution analysis, sparse structural equation modeling, and, in the case of MN-CASTLE, Bayesian nonstationary graph learning to recover multilayer, dynamic graphical models. Their empirical performance and robustness to noise and nonstationarity fundamentally expand the toolbox for scientific inference in finance, neuroscience, and systems biology.
1. Multiscale Causal Graphical Models for Time Series
Causal structure learning in time series seeks to infer the directed acyclic graph (DAG) or other dependency networks that best explain the observed dependencies between components, respecting the sequential and potentially multiresolution character of real data. In the SVARM (structural vector autoregressive model) regime, the process is represented as
where encodes instantaneous () and lagged () effects (D'Acunto et al., 2022).
MS-CASTLE extends this to the multiscale regime by using stationary wavelet transforms (SWT) to represent each time series at several scales (details). The model then couples these decompositions via block-diagonal versions of the SVARM, with separate matrices for each scale . Instantaneous effects at each scale are constrained to be DAGs, while lagged effects are naturally acyclic given temporal ordering.
MN-CASTLE generalizes to the multiscale nonstationary case: the generative model samples a multiscale nonstationary DAG (MN-DAG), defining, at each scale and time , a weighted adjacency matrix . The order over nodes is globally shared but the edge strengths can vary smoothly in time, controlled by user priors on nonstationarity, density, and scale complexity (D'Acunto et al., 2022).
2. Methods: Wavelet Transform, Sparse Optimization, and Bayesian Learning
MS-CASTLE begins by decomposing observed series via a stationary wavelet transform. For 0 scales and 1 variables, this yields an augmented vector
2
The optimization objective is block-diagonal SVARM, penalized for both edge sparsity (3) and acyclicity via a smooth “dagness” function: 4 The final optimization problem is: 5 This is solved by ADMM using an augmented Lagrangian, with linearization of the nonconvex acyclicity constraint at each iteration and block-wise sparsity imposed via soft-thresholding (D'Acunto et al., 2022).
MN-CASTLE introduces additional modeling for nonstationarity and time-varying edge weights:
- Node ordering is treated as a latent variable and inferred as a stationary SEM via Plackett-Luce prior and SVI (step 1).
- Scale- and time-specific edge weights are modeled as GPs and updated in a second SVI step, constrained by observed local partial correlations.
- The fitting is fully Bayesian, using stochastic gradients and reparameterization tricks for efficiency (D'Acunto et al., 2022).
3. Comparison to Related Multiscale and Multiresolution Methods
MS-CASTLE and MN-CASTLE represent a leap beyond traditional single-scale or stationary Granger-causal/VAR approaches, which cannot distinguish effects at different time resolutions or adapt to evolving dependency structures. The MS-CASTLE strategy of block-diagonal SVARM with acyclicity and sparsity constraints substantially outperforms previous methods such as DYNOTEARS and VAR-DirectLiNGAM in terms of speed, robustness to noise, and structural Hamming distance recovery, especially for small sample sizes and heavy-tailed or non-Gaussian noise (D'Acunto et al., 2022).
MN-CASTLE further integrates time-varying DAGs across multiple frequencies and scales, with a Bayesian treatment of node ordering and a flexible GP prior for edge trajectories. On synthetic benchmarks encompassing varying degrees of nonstationarity, density, and noise, MN-CASTLE achieves highest F1 and lowest SHD across all settings, and maintains superior ordering recovery (nDCG@3). MS-CASTLE is a baseline in these experiments but underperforms under nonstationarity due to its static edge structure (D'Acunto et al., 2022).
4. Empirical Performance, Case Studies, and Domain-Specific Insights
MS-CASTLE was empirically validated on both synthetic VARM constructions and a real-world study of volatility contagion among 15 global equity markets during the COVID-19 pandemic. In the equity-market risk analysis:
- MS-CASTLE used GARCH(1,1) volatilities as inputs, decomposed into 6 SWT scales (2–4 d up to 16–32 d).
- The most persistent and strongest causal effects appeared at mid-scales (4–16 d), with Brazil, Canada, and Italy exerting the largest lagged causal influence on other markets.
- Single-scale approaches recovered only autoregressive (self) dependencies; MS-CASTLE uniquely revealed horizon-dependent risk transmission (D'Acunto et al., 2022).
MN-CASTLE was benchmarked on synthetic time series with controlled scale, edge density, and nonstationarity, as well as applied to U.S. natural gas price drivers (production, storage, uncertainty, oil prices) from 2018–2022. MN-CASTLE alone identified time-varying causal effects of economic uncertainty and storage deviations on gas prices, with these links strengthening during COVID-19 and the Russian invasion of Ukraine. Competing approaches either missed these dependencies or detected spurious links (D'Acunto et al., 2022).
5. Methodological Extensions and Theoretical Guarantees
Extensions to MS-CASTLE discussed include relaxing the strict block-diagonal assumption, introducing inter-scale coupling and addressing nonstationarity via time-varying DAG weights, potentially with Gaussian-process or nonparametric evolution. MS-CASTLE has also demonstrated robust theoretical performance in terms of convergence (to KKT points), computational scaling (solving a series of convex quadratic programs with block-structure), and preservation of correct edge support under mild network sparsity and sample size regimes (D'Acunto et al., 2022).
MN-CASTLE’s Bayesian framework incorporates user priors for number of scales (multiscale complexity, μ), degree of nonstationarity (τ), and average edge density (δ), providing a transparent generative model for both time-varying and multiscale graphs. Empirically, the variational optimization is scalable to moderate 7 (up to 20) and efficiently exploits local partial coherence to mask GP updates, accelerating convergence. The precision of recovered edge posteriors and node ordering is maintained even under mismatch of noise distribution or partial downsampling (D'Acunto et al., 2022).
A plausible implication is that these frameworks are extensible to nonlinear causal kernels, cross-scale interactions, and integration with Bayesian model selection or credible interval estimation, as indicated in concluding remarks (D'Acunto et al., 2022).
6. Relationship with Coarse-Grained Modeling and Alternative Approaches
MS-CASTLE/MN-CASTLE share with certain SDE moment-matching estimators (Kalliadasis et al., 2014) the objective of discovering parsimonious, interpretable models for high-dimensional and multiscale dynamics. However, the approaches differ in basic philosophy:
- MS-CASTLE/MN-CASTLE target the recovery of multiscale, possibly nonstationary, DAGs and SVARMs using wavelet decompositions and block-sparse regression.
- SDE-based methods operate on single observed time series (slow variables), and exploit homogenization and Dynkin’s formula to obtain unbiased parameter estimates for coarse-grained drift and diffusion, thereby filtering fast-scale contamination.
- Integration is possible: use MS-CASTLE to select active functional forms (dictionary) before robustly estimating parameter strengths with SDE-based estimators, or adopt moment-equation estimation as a parameter-fitter in a Bayesian MS-CASTLE for credible interval quantification with explicit control over multiscale biases (Kalliadasis et al., 2014).
7. Limitations, Open Challenges, and Future Directions
MS-CASTLE, in its current form, enforces independence across scales and stationarity within each block, lacking the explicit modeling of cross-scale causal interactions or time-varying edges present in MN-CASTLE. The original MS-CASTLE paper points to the need to handle nonstationarity, enable cross-scale dependencies, and generalize to nonlinear models as next steps (D'Acunto et al., 2022). MN-CASTLE operationalizes several of these directions but is computationally more intensive and thus applied to moderate datasets (D'Acunto et al., 2022).
Open problems include adaptability to high-dimensional settings (large 8), further limiting the computational burden of nonparametric variational inference, and the estimation of theoretical error bounds for general nonlinear, non-Gaussian dynamical systems. The potential for hybridization with moment-based estimators or use as structure-learning modules in hierarchical generative models remains an active area of research.
In summary, MS-CASTLE and MN-CASTLE constitute a foundational methodology for causal inference in complex multiscale time series, facilitating robust, interpretable modeling, and offering a modular basis for further advances in time series structure learning (D'Acunto et al., 2022, D'Acunto et al., 2022, Kalliadasis et al., 2014).