Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation (2107.03502v2)

Published 7 Jul 2021 in cs.LG and stat.ML

Abstract: The imputation of missing values in time series has many applications in healthcare and finance. While autoregressive models are natural candidates for time series imputation, score-based diffusion models have recently outperformed existing counterparts including autoregressive models in many tasks such as image generation and audio synthesis, and would be promising for time series imputation. In this paper, we propose Conditional Score-based Diffusion models for Imputation (CSDI), a novel time series imputation method that utilizes score-based diffusion models conditioned on observed data. Unlike existing score-based approaches, the conditional diffusion model is explicitly trained for imputation and can exploit correlations between observed values. On healthcare and environmental data, CSDI improves by 40-65% over existing probabilistic imputation methods on popular performance metrics. In addition, deterministic imputation by CSDI reduces the error by 5-20% compared to the state-of-the-art deterministic imputation methods. Furthermore, CSDI can also be applied to time series interpolation and probabilistic forecasting, and is competitive with existing baselines. The code is available at https://github.com/ermongroup/CSDI.

Citations (409)

Summary

  • The paper introduces a conditional diffusion model for imputing missing time series data with significant improvements in CRPS and MAE.
  • It employs a self-supervised training approach that conditions on available observations to directly model the distribution of missing values.
  • The method demonstrates robust performance on healthcare and environmental datasets, paving the way for efficient and versatile imputation techniques.

Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation

The paper "Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation" introduces a novel approach to handle missing data imputation in multivariate time series using conditional score-based diffusion models. This method builds on the recent advancements in diffusion models known for their success in high-quality sample generation tasks such as image and audio synthesis. The paper aims to extend these models to the domain of time series imputation, particularly focusing on improving over traditional autoregressive models and existing score-based methods.

Methodology

The cornerstone of the paper's contribution is the development of \pname{}, which leverages score-based diffusion models specifically conditioned on available observations. Unlike traditional methods that rely on approximations or utilize noise-added observations, \pname{} directly models the conditional distribution for the missing values. The algorithm works by first training the conditional diffusion model using a novel self-supervised approach. This involves splitting observed data into two parts: one for conditional information and the other for imputation targets, allowing the model to learn from data even when ground-truth missing values are unavailable.

Training a diffusion model, which involves learning to denoise samples across a sequence of noise scales, is adapted in \pname{} by employing an adjusted version of the loss function used in denoising diffusion probabilistic models (DDPM). The output is a mechanism that can convert stochastic noise into plausible time series scenarios conditioned on partial observations with significantly reduced error margins compared to state-of-the-art methods.

Results

Empirical evaluations presented in the paper demonstrate the efficacy of \pname{} on both healthcare and environmental datasets. The performance improvements are notable—with a 40-65% enhancement in the continuous ranked probability score (CRPS) over existing probabilistic approaches. Deterministic imputation results also show a reduction in mean absolute error (MAE) by 5-20%, underscoring the model's robust handling of deterministic tasks alongside probabilistic forecasting and interpolation.

The evaluation scenarios included various missing data ratios, addressing multiple missing data patterns, and showcased \pname's superiority in generating realistic imputations with quantified uncertainties. The inclusion of tasks such as time series interpolation and forecasting further demonstrated the flexibility and applicability of the proposed model across related imputation tasks.

Implications and Future Directions

The paper posits that \pname{} not only advances the field of machine learning in address missing data challenges but also potentially shifts the paradigm towards wider applications of diffusion models in structured data tasks. This approach holds promise for other modalities beyond time series, thanks to its capability to model dependencies conditioned on partial observations effectively.

Looking forward, the research opens avenues for exploration in enhancing computational efficiency, given diffusion models' traditionally slow sampling times. Integrating methods such as ODE solvers for accelerated sampling may further extend \pname's utility. Moreover, incorporating this imputation approach into larger, end-to-end systems could streamline learning processes for classification and predictive analysis tasks where missing data is a hindrance. Lastly, extending these methodologies to other structured data forms, further leveraging conditional modeling, represents a compelling research trajectory.

In conclusion, the paper presents a methodologically sound and impactful contribution to the domain of probabilistic time series imputation, demonstrating substantial improvements over traditional models and opening doors for future research directions in both efficiency and applicability across diverse data formats.

X Twitter Logo Streamline Icon: https://streamlinehq.com