Zero-Shot Missing-Data Imputation

Updated 14 February 2026

Zero-shot imputation is the recovery of missing structured data using a pre-trained, frozen model without any test-time adaptation.
It leverages neural architectures like conditional flows, transformers, and ODE-based systems to handle various missingness patterns (MCAR, MAR, MNAR).
Empirical studies demonstrate high accuracy and computational efficiency across tabular, time series, and sensor data, despite scalability and long-gap challenges.

Zero-shot missing-data imputation refers to the inference-time estimation of missing elements in structured data (tabular, time series, or spatio-temporal) by a model that has not undergone any further adaptation, fine-tuning, or supervised learning on the test-time data or its missingness pattern. The zero-shot paradigm presumes pre-training on relevant distributions, synthetic surrogates, or source domains, after which the model can be deployed to arbitrary target settings—including new missingness masks, data domains, or even modalities—without reconfiguration. Core to the field are methods that combine flexible neural architectures, generative modeling, and scalable conditioning mechanisms. The principal challenge is retaining accuracy, uncertainty calibration, and computational efficiency across heterogeneous distributions of missingness (“MCAR,” “MAR,” “MNAR”) and data types, especially when ground-truth observations are unavailable or limited.

1. Definition and Scope of Zero-Shot Imputation

Zero-shot imputation is defined as the recovery of missing entries within a data structure using a parameterized function $G_\theta$ that is frozen after pre-training and requires no per-task or per-missingness adaptation. In this context, missingness may occur as arbitrary masks in tabular data, intermittent frames in temporal sequences, or structured gaps in sensor arrays. The crucial requirement is that the map from observed to imputed values is transferable, i.e., a shared imputation model $p_\theta(x_{\text{miss}} | x_{\text{obs}})$ supports all missingness patterns without explicit re-estimation for new patterns.

Key criteria:

The imputer is exposed to a wide range of missingness patterns during training (including MCAR, MAR, MNAR, or synthetic analogs).
At inference, the model completes new samples “as is,” making no use of downstream labels, hyperparameters, or retraining.
Applicability spans tabular settings (Feitelberg et al., 3 Oct 2025), multivariate time series (Simkus et al., 10 Jun 2025), and high-dimensional spatio-temporal signals (Khamis et al., 9 Feb 2025).

This is distinguished from multi-task or transductive imputation, which often requires at least one round of adaptation on the test mask or target domain.

2. Principal Methodologies

Three principal approaches to zero-shot missing-data imputation have emerged:

2.1 Conditional Flow-Based Imputation

Conditional Flow-Matching for Imputation (CFMI) utilizes a continuous normalizing flow architecture parameterized by a neural ODE, $f_\theta : \mathbb{R}^D \times \{0,1\}^D \times [0,1] \to \mathbb{R}^D$ . The model integrates an initial value $x(0)$ forward in time to transport a simple base distribution (e.g., isotropic Gaussian) into the target conditional data distribution, taking into account both the observed values and the explicit missingness mask. The shared conditional distribution is learned by “flow matching” between the neural vector field and an analytically tractable target field, enabling efficient training without explicit density estimation or iterative mask-specific retraining. At test time, missing values are imputed by numerically integrating the vector field restricted to unobserved coordinates, starting from a random initialization. Performance in both tabular and time-series settings has been shown to surpass diffusion-based and classical methods on a variety of metrics (Simkus et al., 10 Jun 2025).

2.2 Pre-Trained Transformer-Based Imputation

TabImpute leverages an entry-wise featurization scheme and a pre-trained transformer backbone to perform zero-shot cell-wise imputation in tabular data. Each cell $(i,j)$ is represented as a token embedding $\mathbf{z}_{ij}$ by concatenating row/column indices and their associated observed vectors. The transformer model is trained on synthetically generated data and a comprehensive suite of missingness patterns, including realistic MNAR scenarios. At inference, no fine-tuning or fitting is required. Output distributions for missing cells are computed in a single forward pass; adaptive ensembling with TabPFN further enhances robustness. Benchmarks across medicine, finance, and engineering domains support its efficacy and computational superiority (Feitelberg et al., 3 Oct 2025).

2.3 Continuous-Time Latent ODE Imputation for Sensor Streams

NeuralPrefix introduces a continuous dynamical system-based prefix module for spatio-temporal sensory imputation. The internal latent state, parameterized by a ConvGRU and an ODE dynamics network $g_\theta$ , evolves by integrating through intervals with missing observations, conditioned on the encoded history of available frames. Missing sensor readings are reconstructed continuously by solving the latent ODE, after which a modular flow-plus-residual decoder reconstructs the data frame. The method excels at handling both interpolation and extrapolation, and supports zero-shot adaptation to new domains, sensors, or even modalities (Khamis et al., 9 Feb 2025).

3. Training Procedures and Loss Functions

3.1 Objective Formulations

CFMI adopts a flow-matching loss:

$\mathcal{L}(\theta) = \mathbb{E}_{\text{mask}, \text{split}, t, x^t} \left[\frac{1}{|s_t|}\|f_\theta(\cdot)[s_t] - u(\cdot)[s_t]\|^2\right]$

where $s_t$ denotes randomly chosen target coordinates, and $u$ is the ground-truth target vector field.

TabImpute minimizes the expected negative log-likelihood for missing entries across 13 missingness patterns, adaptively upweighting challenging cases:

$\mathcal{L}(\theta) = \mathbb{E}_{X^*, M}\left[- \sum_{(i,j)\in\Omega} \log q_\theta(X^*_{ij} \mid z_{ij})\right]$

NeuralPrefix uses a reconstruction-based loss:

$L(\theta) = \lambda_1 L_{\text{shrinkage}}(x'(M), x(M)) + \lambda_2 \|R'(M) - R(M)\|_2^2 + \lambda_3 \|x'(M) - x(M)\|_2^2$

with no adversarial terms, to improve transferability.

3.2 Mask and Split Sampling

All three approaches simulate a wide distribution of missingness masks during training—random masking, column/row-conditioned MAR, and complex MNAR scenarios. For CFMI and TabImpute, this involves explicit sampling of splits and patterns in every batch, ensuring coverage of the full conditional support encountered at test time (Simkus et al., 10 Jun 2025, Feitelberg et al., 3 Oct 2025). NeuralPrefix simulates 50% random frame drops and extrapolation windows (Khamis et al., 9 Feb 2025).

4. Inference Mechanisms and Computational Properties

Method	Imputation Mechanism	Adaptation Steps at Test Time	Typical Inference Time
CFMI	ODE integration in missing dims	None (parameters frozen)	$K\approx100$ Euler steps/sample
TabImpute	Single transformer forward pass	None	$\sim10$ μs per entry (GPU)
NeuralPrefix	Latent ODE + decoder per frame	None	124–196 ms/10–20 frames (GPU)

Inference is performed with a single model pass (TabImpute), or a fixed-iteration ODE/GRU integration (CFMI, NeuralPrefix). No hyperparameter tuning, re-estimation, or mask-specific retraining is required.

Key efficiency findings:

Flow-matching (CFMI) is up to $50\times$ more efficient during training than diffusion-based schemes, and $2\times$ faster during inference (Simkus et al., 10 Jun 2025).
TabImpute achieves $100\times$ faster inference than TabPFN’s iterative imputation, with all tokens parallelized (Feitelberg et al., 3 Oct 2025).
NeuralPrefix offers O(100ms)-scale inference for dense sensory reconstruction, with accuracy-flexibility trade-offs governed by ODE solver tolerances (Khamis et al., 9 Feb 2025).

5. Empirical Performance Across Modalities

Extensive quantitative studies support the reliability and generality of zero-shot imputers in challenging settings:

CFMI on time series (PhysioNet, PM2.5): CRPS 0.282–0.098, outperforming CSDI (diffusion) in speed and matching or improving on accuracy; achieves best average rank on 12 metrics across 24 UCI tabular datasets (Simkus et al., 10 Jun 2025).
TabImpute on 42 OpenML tables (MCAR, MAR, MNAR): overall imputation accuracy $0.833 \pm 0.213$ , exceeding all established baselines, with robustness across distribution shift and imputation types; operates at GPU-scale batch throughput (Feitelberg et al., 3 Oct 2025).
NeuralPrefix on spatio-temporal sensor data (MCD, Soli, Intelligent Carpet): SSIM 0.93–0.96 for 50% missing frames in-domain, ~0.88–0.94 for out-of-domain and cross-modality; robust to interpolation, extrapolation, and OOD settings (Khamis et al., 9 Feb 2025).

These results are obtained without task-specific adaptation, indicating that both conditional flows and transformer-based approaches generalize effectively with sufficient synthetic or real-world mask coverage during pre-training.

6. Limitations and Failure Modes

Several constraints remain inherent in current zero-shot imputation methods:

Scalability: Quadratic attention in TabImpute limits scalability beyond a few hundred rows or columns (Feitelberg et al., 3 Oct 2025). CFMI and NeuralPrefix may also encounter memory or computation bottlenecks in high-dimensional or long-sequence regimes.
Data Type Coverage: TabImpute currently supports continuous data only; categorical and mixed-type columns are not addressed (Feitelberg et al., 3 Oct 2025). CFMI and NeuralPrefix are more data-type agnostic but may require embedding strategies for categorical content (Simkus et al., 10 Jun 2025, Khamis et al., 9 Feb 2025).
Long-range or Discontinuous Gaps: NeuralPrefix’s ODE prior can drift for long unobserved gaps or under sharp discontinuities, with performance degrading due to the smoothness assumption of the dynamical model (Khamis et al., 9 Feb 2025).
Training Data Realism: Pre-training on synthetic (e.g., linear factor) models may not fully capture nonlinearities and class-conditional variation seen in practice (Feitelberg et al., 3 Oct 2025).

These limitations motivate ongoing research into sparse attention for scaling, mixed-type modeling, alternative generative priors, and robustification to structured missingness and drift.

7. Future Directions and Open Problems

Active research seeks to advance zero-shot missing-data imputation through:

Linear or sparse attention mechanisms to enable scaling to much larger tabular or spatio-temporal datasets (Feitelberg et al., 3 Oct 2025).
Support for categorical, ordinal, and mixed-type data representations.
Incorporating richer synthetic or empirical priors, such as structural causal modeling or variational autoencoders, to match real-world complexity (Feitelberg et al., 3 Oct 2025).
Development of multi-imputation procedures to propagate uncertainty, not just point estimates, into downstream analyses.
Dynamic adaptation and domain-adaptive fine-tuning to combine zero-shot capability with improved specialization, while avoiding catastrophic forgetting.
Extensions to multi-sensor, multi-modal, and fusion settings where side information or sensor IDs are explicitly modeled for transfer (Khamis et al., 9 Feb 2025).

A plausible implication is that as the field addresses scale, type, and transfer limitations, zero-shot imputation may converge toward universal, plug-in modules analogous to foundation models in other subfields, promoting imputation as a first-class unsupervised inference primitive across scientific and industrial domains.

Markdown Upgrade to Chat

References (3)

TabImpute: Accurate and Fast Zero-Shot Missing-Data Imputation with a Pre-Trained Transformer (2025)

CFMI: Flow Matching for Missing Data Imputation (2025)

NeuralPrefix: A Zero-shot Sensory Data Imputation Plugin (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Zero-Shot Missing-Data Imputation.