- The paper introduces a novel method using conditioned normalizing flows to complement autoregressive models for capturing complex dependencies in multivariate data.
- It demonstrates state-of-the-art performance on benchmarks such as Exchange, Solar, and Traffic using the CRPS metric for evaluation.
- The approach enhances uncertainty estimation in forecasting and broadens applications in finance, energy, and traffic management.
Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows
Introduction
The paper "Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows" (2002.06103) introduces a novel approach for time series forecasting using autoregressive deep learning models combined with conditioned normalizing flows. Traditional time series prediction models often assume independence between interacting time series due to the challenges in scaling and modeling statistical dependencies. This paper addresses such limitations by integrating the flexibility of normalizing flows, specifically Real NVP and Masked Autoregressive Flow (MAF), with autoregressive models to effectively capture high-dimensional data distributions while remaining computationally efficient.
Model Approach
The approach leverages normalizing flows to model the data distribution, which allows capturing complex dependency structures in multivariate data without being limited to simple parametric distributions. The models can efficiently handle thousands of interacting time series, offering a significant advancement over previously limited methods such as Vec-LSTM using low-rank Gaussian copulas or Gaussian Process models with scaling.
The core innovation lies in conditioning the joint distribution of multivariate time series data using flows, thus allowing the embedding of temporal dynamics directly into the probabilistic frameworks. By employing autoregressive models for temporal conditioning, such as recurrent neural networks (RNNs) or Transformers, the models can retain strong extrapolation capabilities, crucial for accurate forecasting.

Figure 1: Estimated (cross-)covariance matrices. Darker means higher positive values. Left: Covariance matrix for a fixed time point capturing the correlation between S1​ and S2​. Right: Cross-covariance matrix between consecutive time points capturing true flow of liquid in the pipe system.
Experiments and Results
Experiments were conducted using various datasets, such as Exchange, Solar, Electricity, Traffic, Taxi, and Wikipedia, spanning different domains and showing the robustness of the proposed models. The models were evaluated using the Continuous Ranked Probability Score (CRPS) metric, which measures the compatibility of forecast distributions with observed outcomes.
The results demonstrate that the proposed frameworks, specifically models using MAF with RNN or Transformer conditioning, achieve state-of-the-art performance across all datasets tested. Notably, these models outperform the competitive baselines, highlighting the efficacy of the normalizing flow-based approach in capturing intricate dependency structures inherent in multivariate time series data.
Figure 2: Visual analysis of the dependency structure extrapolation of the model. Left: Cross-covariance matrix computed from the test split of Traffic benchmark. Middle: Cross-covariance matrix computed from the mean of 100 sample trajectories drawn from the Transformer-MAF model's extrapolation into the future (test split). Right: The absolute difference of the two matrices mostly shows small deviations between ground-truth and extrapolation.
Implications and Future Directions
The introduction of conditioned normalizing flows to time series forecasting significantly advances the state-of-the-art in modeling complex multivariate dependencies. It provides practitioners with powerful tools for better uncertainty estimation, a crucial factor for decision-making in fields such as finance, energy, and traffic management.
Future work could explore the incorporation of enhanced flow architectures, such as Flow++, to further improve the densities captured by these models. Additionally, bridging the gap between continuous and discrete data distributions via flows could broaden the applicability of this approach. The adaptability of the model to high-dimensional settings suggests potential extensions to other areas like image and video processing where temporal dependencies are prominent.
Conclusion
The paper presents a sophisticated technique that marries the predictive power of autoregressive neural networks with the distributional flexibility of normalizing flows, facilitating improved forecasting accuracy and scalability in multivariate time series analysis. This advancement not only propels current methodologies but also paves the way for further innovations in the field of probabilistic modeling.