Conditional Score-based Diffusion Imputation

Updated 29 July 2025

CSDI is a probabilistic generative framework that leverages conditional diffusion to accurately impute missing values in high-dimensional time series.
It utilizes a conditioned reverse process of denoising diffusion models with two-dimensional attention to capture complex temporal and inter-variable dependencies, improving CRPS by 40–65% and reducing MAE by 5–20%.
Empirical evaluations on healthcare and environmental datasets demonstrate its robustness and extendibility to interpolation and probabilistic forecasting tasks.

Conditional Score-based Diffusion Imputation (CSDI) is a probabilistic generative framework for imputing missing values in high-dimensional, multivariate time series via conditional score-based diffusion models. In contrast to traditional autoregressive imputation techniques, which generate missing values sequentially and can be limited by their dependence structure, CSDI adapts denoising diffusion probabilistic models (DDPMs) to produce imputations by reversing a stochastic noising process while conditioning on observed data. This approach enables direct modeling of the conditional distribution over missing values, effectively capturing complex temporal and cross-feature dependencies. In empirical studies on healthcare and environmental datasets, CSDI achieves 40–65% improvements in CRPS compared to baselines and offers 5–20% reductions in MAE for deterministic imputation. The methodology proves extensible beyond imputation to time series interpolation and probabilistic forecasting tasks (Tashiro et al., 2021).

1. Motivation and Paradigm Shift

CSDI is motivated by the recent advances of score-based diffusion models in high-dimensional generative modeling (e.g., images, audio), where a forward process iteratively perturbs data into an isotropic Gaussian distribution via injective noise, and a learned reverse process reconstructs realistic samples by denoising. Classical autoregressive time series imputation models—RNNs, GPs, BRITS, etc.—are naturally limited in their expressivity and often suffer from error accumulation and difficulty in modeling long-range interactions. CSDI introduces a paradigm shift by casting imputation as conditional distribution learning: it directly samples plausible missing values in one step by leveraging the reverse process of diffusion, explicitly conditioned on observed data. This formulation enables the exploitation of both temporal and cross-feature dependence structure in the inherently high-dimensional and structured time series domain.

2. Mathematical Framework and Architecture

CSDI extends the standard DDPM by explicitly conditioning the reverse diffusion process on observed entries. Consider a time series $x_0 \in \mathbb{R}^{d \times T}$ partitioned into observed ( $x_{0_o}$ ) and unobserved ( $x_{0_u}$ ) sets according to a conditional mask $C$ . The reverse process models the conditional distribution:

$p_\theta(x_{0_u} | x_{0_o}) = p(x_{T_u}) \prod_{t=1}^T p_\theta (x_{t-1_u} \mid x_{t_u}, x_{0_o}),$

where each reverse transition is Gaussian:

$p_\theta(x_{t-1_u} \mid x_{t_u}, x_{0_o}) = \mathcal{N}(x_{t-1_u}; \mu_\theta(x_{t_u}, t \mid x_{0_o}), \sigma_\theta^2(x_{t_u}, t \mid x_{0_o}) I).$

Here, the mean $\mu_\theta$ is parameterized as in DDPMs using a denoising function $\epsilon_\theta$ , but crucially, $\epsilon_\theta$ is a conditional function taking as input both the partially noised "target" values and the observed "condition" entries, with an indicator mask. The loss function for self-supervised training is

$\min_\theta \, \mathbb{E}_{x_0, \epsilon, t}[ \| \epsilon - \epsilon_\theta(x_{t_u}, t \mid x_{0_o}) \|_2^2 ],$

with $x_{t_u} = \sqrt{\bar{\alpha}_t} x_{0_u} + \sqrt{1 - \bar{\alpha}_t} \epsilon$ , mirroring denoising score matching but under explicit conditioning.

The core model architecture exploits two-dimensional attention. Independent temporal and feature-wise (cross-variable) Transformer blocks are stacked with residual connections, enabling the network to simultaneously model variable-specific time dynamics and dependencies across variables. This decomposed self-attention mechanism provides scalability and enhanced expressivity for high-dimensional multivariate time series.

3. Conditioning, Masking, and Self-supervision

A hallmark of CSDI’s training strategy is its self-supervision regime, where observed data are partitioned at random during each epoch into imputation targets and conditions using a selection strategy—random, historical, mix, or test-pattern. This split ensures that the model learns to exploit both temporal and cross-feature structure and remains robust to diverse missingness patterns seen in real-world applications. The conditional denoising function $\epsilon_\theta$ is adjusted to accept zero-padded arrays and corresponding masks, analogous to masked language modeling schemes, granting the architecture flexibility to accommodate arbitrary observation patterns.

4. Empirical Evaluation and Quantitative Benefits

The empirical performance of CSDI is evaluated extensively on healthcare and environmental datasets:

Datasets and Metrics

PhysioNet 2012 Healthcare Dataset: 4,000 multivariate ICU time series, 35 variables, 48-hour horizons, ~80% missing.
Beijing PM2.5 Air Quality: Hourly data over 36 stations, 36 time steps, ~13% missing, structured non-random patterns.

Performance is assessed on probabilistic metrics such as Continuous Ranked Probability Score (CRPS), deterministic ones (MAE, RMSE), and aggregated measures like CRPS-sum for probabilistic forecasts.

Results

Probabilistic Imputation: CSDI yields CRPS reductions of 40–65% over Multitask GP, GP-VAE, and V-RIN.
Deterministic Imputation: Using the median of generated samples, MAE decreases by 5–20% compared to BRITS, GLIMA, and others.
Robustness: The framework demonstrates strong performance even at high missingness ratios (e.g., 90% observed values masked).

CSDI is also evaluated for interpolation (irregular time point completion) and probabilistic forecasting (predicting future sequences), outperforming Latent ODEs, mTANs (in interpolation), and remaining competitive with TimeGrad, Transformer MAF, and TLAE (in forecasting), even though these settings stress the autoregressive or sequence-to-sequence design of typical baselines.

5. Applications and Generalizations

CSDI's conditional formulation generalizes naturally beyond direct imputation:

Time Series Interpolation: Filling in values at arbitrary, possibly irregular, time points.
Probabilistic Forecasting: Predicting distributions over future series given arbitrary observed prefixes.
Other Modalities: The same structure can, in principle, address imputation in images or other structured data by conditioning on observed sets and adapting the attention mechanisms.

The codebase aids reproducibility and extension to new domains. The architecture invites adaptation to downstream supervised tasks (e.g., joint imputation–classification pipelines) and integration of accelerated sampling techniques such as DDIM for faster inference.

6. Limitations and Future Directions

While CSDI demonstrates substantial gains over autoregressive and deep probabilistic baselines, the diffusion sampling process remains computationally intensive due to iterative reverse process steps. Potential avenues to mitigate this include integrating recent ODE solvers for conditional diffusion models, lowering the number of reverse steps required, or developing amortized diffusion samplers. Further, aligning the imputation framework with supervised downstream tasks, especially in clinical and environmental domains, may elevate real-world impact and enable joint optimization for accuracy and uncertainty quantification. The core mechanism is currently focused on continuous-valued series, yet extensions to mixed or categorical domains can be considered via discretized score-based modeling.

7. Comparative Perspective and Significance

Conditional Score-based Diffusion Imputation signifies a conceptual advance by restructuring imputation as conditional generative modeling. This move allows multivariate time series imputation to fully exploit the rich dependency structure among features and across time, overcoming limitations of sequential and marginal approaches. The substantial empirical improvements—40–65% in probabilistic scores, 5–20% gains in pointwise errors—establish the relevance of this approach for high-stakes applications where robust and accurate imputation is critical (Tashiro et al., 2021). The framework remains extensible and is already the basis for multiple subsequent works in multi-modal imputation, state-space modeling, and domains beyond time series.

PDF Markdown Chat (Pro)

References (1)

CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation (2021)

Follow Topic

Get notified by email when new papers are published related to Conditional Score-based Diffusion Imputation (CSDI) Framework.