Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting (2101.12072v2)

Published 28 Jan 2021 in cs.LG and cs.AI

Abstract: In this work, we propose \texttt{TimeGrad}, an autoregressive model for multivariate probabilistic time series forecasting which samples from the data distribution at each time step by estimating its gradient. To this end, we use diffusion probabilistic models, a class of latent variable models closely connected to score matching and energy-based methods. Our model learns gradients by optimizing a variational bound on the data likelihood and at inference time converts white noise into a sample of the distribution of interest through a Markov chain using Langevin sampling. We demonstrate experimentally that the proposed autoregressive denoising diffusion model is the new state-of-the-art multivariate probabilistic forecasting method on real-world data sets with thousands of correlated dimensions. We hope that this method is a useful tool for practitioners and lays the foundation for future research in this area.

Citations (248)

View on Semantic Scholar

Summary

The paper presents TimeGrad, an autoregressive denoising diffusion model that leverages latent variable transformations for accurate multivariate probabilistic forecasting.
Its training method uses a variational bound and a Markov chain with Gaussian noise to effectively learn complex temporal dependencies.
Experimental results show that TimeGrad outperforms traditional and neural approaches across multiple datasets, notably in CRPS scores.

Overview of "Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting"

The paper introduces TimeGrad, an autoregressive denoising diffusion model designed for multivariate probabilistic time series forecasting. Unlike traditional univariate time series models, TimeGrad focuses on capturing the intrinsic multivariate dependencies by leveraging diffusion probabilistic models, which are closely related to energy-based methods and score matching.

Core Contributions

Autoregressive Denoising Diffusion: The model uses diffusion probabilistic methodologies to sample from the data distribution iteratively. It achieves this by estimating the gradient of the data distribution at each time step through a series of latent variable transformations.
Training via Variational Bound: Training of TimeGrad optimizes a variational bound on the data likelihood. This is facilitated through a Markov chain with a fixed forward process that incrementally adds Gaussian noise, counteracted by a learned reverse process that denoises the data during inference.
Promising Experimental Results: The experimental results presented in the paper indicate that TimeGrad establishes itself as a state-of-the-art method for multivariate probabilistic forecasting on multiple real-world datasets.

Results and Comparisons

The paper performs extensive comparisons across six datasets, including Exchange, Solar, Electricity, Traffic, Taxi, and Wikipedia. TimeGrad consistently outperforms alternative methods like VAR, GARCH, and various neural-based approaches on several metrics, notably on the Continuous Ranked Probability Score (CRPS). This demonstrates the model's robustness and versatility in handling real-world, high-dimensional time series data.

Technical Approach

Model Architecture: TimeGrad integrates an RNN architecture at its core, utilizing layers of LSTMs or GRUs to model temporal dynamics. At each step, the RNN updates its hidden states based on previous outputs, facilitating sequential learning of time series dependencies.
Diffusion Process: The diffusion model contributes to learning the probability distribution via a parameterized reverse process that iteratively reverses the added Gaussian noise. This noise is scheduled according to a linear variance approach to maintain tractability and effectiveness.
Scalability and Efficiency: Despite the complex modeling of high-dimensional multivariate relationships, TimeGrad maintains computational feasibility. Training involves optimizing loss functions derived from Gaussian distribution KL-divergences, while inference generates samples by progressively refining noise-augmented inputs, mimicking Langevin dynamics.

Implications and Future Directions

This research has significant implications for the practice of probabilistic forecasting in highly interdependent environments. TimeGrad's ability to represent complex relationships makes it well-suited for tasks that rely on accurate multivariate predictions, such as supply chain management and financial forecasting.

Future work could explore enhancements in sampling efficiency or investigate non-linear extensions that incorporate additional domain-specific inductive biases. Furthermore, exploring hybrid architectures, such as combining Transformers with diffusion models, might enhance the model's capacity to handle long sequences effectively.

In summary, TimeGrad demonstrates the potential of diffusion probabilistic models in pushing the boundaries of autoregressive time series forecasting, marking an important step forward in the quest for more accurate and reliable predictive models in dynamic multivariate contexts.

PDF Markdown