- The paper introduces a Self-Consistent Stochastic Interpolants (SCSI) method that recovers clean data from corrupted samples using only black-box simulators.
- It reformulates inverse generative modeling as a fixed-point problem by iteratively updating neural transport maps to enforce distributional self-consistency.
- Empirical results on synthetic, image, and astronomical tasks demonstrate SCSI's competitive restoration performance and accelerated convergence compared to baselines.
Generative Modeling from Black-Box Corruptions via Self-Consistent Stochastic Interpolants
The paper addresses the fundamental challenge of learning generative models when only corrupted measurements are available, which is typical in many scientific and engineering scenarios where the forward measurement process is noisy, ill-conditioned, and often non-invertible. Standard transport-based generative approaches, such as diffusion models and normalizing flows, require direct access to samples from the clean underlying data distribution, which is unrealistic in such settings.
Formally, the scenario involves an inaccessible clean data distribution π and a black-box corruption operator F inducing a pushforward distribution μ=Kπ observed through samples y∼μ. The operator F can be highly nonlinear, non-differentiable, and even stochastic, precluding direct likelihood-based or gradient-based inversion schemes. The central task is therefore to construct a generative model for the clean data π, using as resources only: (1) samples from the corrupted distribution μ; and (2) the ability to produce new corrupted samples via a black-box simulator of F. No knowledge of the analytic form or gradients of the corruption channel is assumed.
Methodology: Self-Consistent Stochastic Interpolants (SCSI)
The proposed approach reformulates the inverse generative modeling task as a fixed-point problem at the distributional level. The core conceptual tool is the Stochastic Interpolant (SI) framework, which allows for constructing transport maps between probability distributions by learning velocity (and optionally, score) fields that interpolate between two distributions via forward and reverse diffusion or flow processes.
The Self-Consistent Stochastic Interpolant (SCSI) method iteratively updates a transport map ΦΘ​ (parameterized by neural networks) via the following outer- and inner-loop structure:
- Outer iteration: For current parameters Θ(k), define the empirical prior as πΘ(k)​=(ΦΘ(k)​)#​μ.
- Inner loop: Construct the SI between πΘ(k)​ (pseudo-clean samples) and KπΘ(k)​ (corrupted pseudo-samples), and update Θ to minimize the SI loss so that the transport map, when applied to μ, leads to a distribution whose pushforward through F matches μ (self-consistency).
- This process enforces distributional self-consistency: at the fixed point, KπΘ∗​=μ, and under injectivity of K, convergence to the ground-truth π is guaranteed.
Crucially, this process requires no access to clean samples, no backpropagation through the corruption channel, and handles settings with only black-box forward operators.
Theoretical Guarantees
The authors provide a rigorous mathematical analysis of the convergence properties of the SCSI scheme:
- Distributional Injectivity: If the observation operator K is injective at the distributional level, the only fixed point of the SCSI iteration is the ground-truth π.
- Metric Contraction Properties: Under suitable regularity conditions and a Lipschitz stability assumption on the SI mappings, the procedure enjoys:
- Linear contraction in Wasserstein distance when the reverse transport is Lipschitz (with contraction rate parameter R<1), yielding W22​(π,π(k))≤RkW22​(π,π(0)).
- Linear contraction in KL-divergence, with a rate controlled by the SI Lipschitz constant and the condition number of the channel χ, under mild regularity of the SI parameterizations.
- Fokker-Planck Channel Specialization: For channels expressible as diffusion processes (e.g., AWGN), the contraction rate can be strictly quantified, and exponential convergence can be established for all SNR regimes.
These results are substantiated by both abstract analysis and explicit closed-form calculations in the Gaussian/AWGN setting.
Figure 2: Convergence of ∣∣Σ−Σk​∣∣2 for the SCSI ODE (red) and EM algorithm (green) in Gaussian/AWGN, empirically highlighting the accelerated, quadratically-convergent rate of the ODE approach.
Empirical Evaluation
The paper presents extensive empirical evaluations on both synthetic and real-data tasks involving a diversity of nonlinear, non-invertible, and non-Gaussian corruption channels:
- Synthetic Benchmarks: The ODE-based SCSI yields faster and more stable convergence in moderate-noise settings, while SDE-based variants offer greater robustness under extreme corruption but are less stable and more hyperparameter sensitive.
- Imaging Tasks (CIFAR-10, CelebA): SCSI is applied to challenging forward models, including:
- Random masking with Gaussian noise,
- Gaussian blur and additive (and Poisson) noise,
- Motion blur (nonlinear operator),
- JPEG compression (non-differentiable, nonlinear corruption).
SCSI consistently demonstrates sample restoration performance competitive with or exceeding strong baselines—including Diffusion Posterior Sampling (DPS) (which requires clean pretraining and forward gradients), and oracle SI models (trained with clean/corrupted pairs). Notably, SCSI achieves strong LPIPS and FID metrics using architectures with fewer parameters and significantly reduced computational resource requirements.
Figure 1: Example restoration of severely JPEG-compressed, noisy images using SCSI. Top: corrupted samples, Bottom: SCSI-restored samples.
Figure 5: Random samples drawn from the large diffusion model retrained on SCSI-restored data (CIFAR-10), demonstrating high generative fidelity.

Figure 7: Restoration under random masking (25% pixels masked); SCSI successfully infers plausible clean images despite massive information loss.
- Physical Science Task (Quasar Spectra Recovery): SCSI is deployed on real astronomical spectra measured under complex, unmodeled noise, calibration uncertainty, and spectral resolution variation. Compared to Wiener filtering, SCSI significantly improves the recovery of physically salient spectral features relevant for scientific inference.
Implications and Limitations
Practical Impact:
- SCSI provides a scalable, modular, and black-box-compatible solution for generative modeling under corrupted data. It does not require explicit analytic access to the corruption process, enabling its application to scientific and industrial data acquisition chains with only simulators or experimental access.
- Empirical results evidence performance competitive with or superior to both model-based and (semi-)supervised baselines, with dramatic computational efficiency gains.
Theoretical Significance:
- The SCSI framework clarifies and extends the class of inverse problems in which distribution-level generative modeling is solvable—even when corruption is stochastic, nonlinear, or non-differentiable—by harnessing the representation power of transport-based models and iterative self-consistency.
- The analysis further highlights a nontrivial phenomenon: in some regimes (notably ODE-based, noise-free transport for Gaussian/AWGN channels), marginal transport (SCSI) can yield faster convergence rates than posterior-based EM or MLE-type schemes.
Limitations and Future Directions:
- The necessity of injectivity at the distributional (not sample) level circumscribes the set of applicable corruption channels, although this still includes many scientifically relevant cases such as AWGN, tomographic projection, and randomized inpainting.
- The practical effectiveness of SDE-based SIs is highly contingent on hyperparameter tuning; ODE-based models generally offer more stability but can underperform in extreme-noise situations.
- Extensions to settings with mixed clean/corrupted supervision or discrete data have natural paths within the SCSI formalism, tying into very recent advances in stochastic interpolant theory.
- Comparative exploration of marginal vs. conditional transport (EM vs. SCSI) merits further investigation, particularly regarding trade-offs between convergence rates and robustness.
Conclusion
This work introduces a theoretically justified and empirically validated framework for generative modeling from corrupted observations, requiring only black-box access to the corruption process. The SCSI algorithm combines the expressivity of stochastic interpolants with a scalable, iterative, self-consistent training scheme, achieving provable convergence to the true data distribution under broad conditions. Empirical results across synthetic, image, and scientific data domains demonstrate that SCSI is competitive with or outperforms baselines needing stronger supervision or analytic access. The results suggest new directions in both the theory of inverse problems and the practice of generative learning under severe data corruption.