Diffusion Regression Model

Updated 18 December 2025

Diffusion-based regression models are a family of algorithms that frame regression as a generative modeling task using forward noising and reverse denoising processes.
They condition on input covariates to generate complete predictive distributions, supporting both point predictions and uncertainty quantification.
Applications include image deblurring, speech enhancement, and symbolic regression, achieving state-of-the-art performance with strong theoretical guarantees.

A diffusion-based regression model is a family of learning algorithms that frame regression as a (conditional) generative modeling task, utilizing stochastic forward–backward processes—typically inspired by nonequilibrium thermodynamics or stochastic differential equations—to learn mapping(s) from input covariates to predictive distributions or point estimates. Such models have attained state-of-the-art results in structured prediction, probabilistic regression, denoising, and symbolic discovery across high-dimensional and structured domains.

1. Mathematical Foundations and Denoising Diffusion as Regression

Diffusion-based regression models operate by applying a progressive noising process to the target variable(s), typically via a discrete-time Markov chain or continuous-time SDE, and then learning a reverse (denoising) process that reconstructs the original data from noise. For scalar or vector-valued targets, the standard forward (noising) process is

$q(y_t \mid y_0) = \mathcal{N}(\sqrt{\bar\alpha_t} \, y_0, (1 - \bar\alpha_t) I)$

and the reverse process is parameterized as

$p_\theta(y_{t-1} | y_t, x) = \mathcal{N}(\mu_\theta(y_t,t,x), \Sigma_\theta(y_t, t, x)),$

where $\bar\alpha_t = \prod_{i=1}^t (1 - \beta_i)$ specifies the cumulative noise schedule (Kneissl et al., 6 Oct 2025, Han et al., 2022).

Training objective is derived from the ELBO or direct regression to the noise, as in

$\mathbb{E}_{t, y_0, \epsilon}\left[\|\epsilon - \epsilon_\theta(y_t, t, x)\|^2\right], \quad y_t = \sqrt{\bar\alpha_t} y_0 + \sqrt{1 - \bar\alpha_t} \epsilon, \ \epsilon \sim \mathcal{N}(0, I),$

which admits a maximum likelihood interpretation as nonlinear regression with additive Gaussian noise (Kong et al., 2023, Moradi et al., 4 Aug 2025).

The connection to classical regression is established through I-MMSE information-theoretic identities, which show that the denoising predictor at each noise level minimizes the conditional mean square error, and the cumulative MMSE curve provides exact density estimation bounds (Kong et al., 2023). This regression viewpoint unifies diffusion models for both continuous densities and discrete masses.

2. Conditioning, Covariates, and Predictive Distributions

Diffusion-based regression models can encode conditioning on input covariates $x$ both in the mean of the noising process and directly as input to the denoiser. CARD (Han et al., 2022) integrates a pretrained mean estimator $f_\varphi(x)$ into both the forward kernel

$q(y_t | y_{t-1}, x) = \mathcal{N}\left( \sqrt{\alpha_t} y_{t-1} + (1-\sqrt{\alpha_t}) f_\varphi(x), \, \beta_t \right)$

and the reverse transition, with $\epsilon_\theta$ receiving both $y_t$ and $f_\varphi(x)$ . Similarly, probabilistic regression approaches (Kneissl et al., 6 Oct 2025) directly condition the generative (reverse) model on $x$ .

For tasks such as conditional distribution estimation, the backbone may be a conditional score network $S_\theta(y, x, t)$ trained by conditional score matching to recover $p(y|x)$ , yielding minimax-optimal rates under total variation and Wasserstein metrics and adaptivity to intrinsic manifold dimension (Tang et al., 30 Sep 2024).

Because all uncertainty in $y|x$ is preserved, these models provide not just point predictions, but full predictive distributions; their stochastic outputs support both aleatoric and epistemic uncertainty quantification (Kneissl et al., 6 Oct 2025, Han et al., 2022).

3. Model Variants and Domain-Specific Extensions

Diffusion-based regression encompasses a wide range of models and extensions:

a. Probabilistic Regression: Generalizes DDPMs to model the full conditional distribution. The predictive posterior is constructed by learning the distribution over the forward noise at each step, with parameterizations including Gaussian mixtures and full-covariance multivariate Gaussians (Kneissl et al., 6 Oct 2025).

b. Robust Regression: Replaces the $L_2$ loss with robust losses (e.g., Huber, least trimmed squares) to increase resilience to outliers or contamination in unsupervised settings. For anomaly segmentation, such robustified Denoising Diffusion Probabilistic Models (RDDPM) outperform standard diffusion and classical anomaly approaches under contamination (Moradi et al., 4 Aug 2025).

c. Discrete/Combinatorial Regression: For discrete, symbolic, or categorical outputs, models such as D3PM-based Symbolic-Diffusion or mask-based DDSR use categorical forward noising and transformer-based denoisers to recover token sequences (e.g., mathematical expressions) (Tymkow et al., 8 Oct 2025, Bastiani et al., 30 May 2025, Han et al., 16 Sep 2025). Training employs variational bounds and/or reinforcement learning updates; inference is parallel and order-agnostic.

d. Latent Space Regression and Hybrid Models: Hierarchical regression–diffusion models for image deblurring (HI-Diff) operate diffusion in compressed latent representations, fusing multi-scale latent priors into regression networks for improved sample efficiency and metric alignment (Chen et al., 2023). In speech enhancement, Brownian-bridge driven regression–diffusion models unify direct regression and denoising by training clean output predictors and enabling both single-shot and iterative inference (Trachu et al., 10 Jun 2024).

e. Plug-and-Play and Surrogate Likelihoods: For inverse and signal recovery problems under difficult (nondifferentiable) observation models, plug-and-play schemes decouple the data-fidelity and diffusion prior terms using differentiable surrogate likelihoods, half-quadratic splittings, and gradient-based inner solvers—enabling efficient 1-bit compressed sensing and logistic regression (Chen et al., 16 Nov 2025).

4. Algorithms, Training Strategies, and Computational Considerations

Training Procedure: Typically, diffusion-based regression models train their denoising network via stochastic estimation: sample data $(x, y_0)$ , sample noise or a discrete time $t$ , corrupt $y_0$ by the forward process to obtain $y_t$ , and regress to reconstruct the denoising statistic (commonly the conditional mean, or, for probabilistic variants, mean and covariance, or noise mixtures).

Loss Functions and Score Matching: Losses range from simple denoising MSE (Han et al., 2022, Kong et al., 2023, Chen et al., 2023) to strictly proper scoring rules (energy score, CRPS, kernelized) that guarantee full distributional recovery (Kneissl et al., 6 Oct 2025). Robust objectives attenuate influence from outliers (Moradi et al., 4 Aug 2025).

Hybrid and Hierarchical Losses: In hierarchical frameworks (e.g., HI-Diff (Chen et al., 2023)) joint losses sum classical regression $L_1$ , latent consistency ( $L_1$ ), and latent diffusion MSE terms. In symbolic regression, denoising and policy gradient surrogates may be combined (Bastiani et al., 30 May 2025).

Inference and Sampling: Reverse-time ancestral sampling, plug-and-play MAP iterations, and variants of score-based samplers are used. Latent diffusion and compressed-space inference can yield 10–20 $\times$ acceleration over pixel-space models (Chen et al., 2023). Regression-mode inference (e.g., set $t=1$ in Thunder (Trachu et al., 10 Jun 2024)) provides fast, one-pass approximations.

Computational Tradeoffs: Robust or plug-and-play approaches may add gradient inner loops. Hybrid regression-diffusion models can offer state-of-the-art performance with modest parameter and FLOP increases. Discrete diffusion approaches can be parallel but may require thousands of denoising steps for high validity (Bastiani et al., 30 May 2025, Tymkow et al., 8 Oct 2025).

5. Empirical Performance, Theoretical Guarantees, and Limitations

Diffusion-based regression methods consistently match or surpass strong deep learning and Bayesian baselines on metrics such as RMSE, negative log-likelihood, distributional scores (CRPS, energy), and uncertainty calibration (Kneissl et al., 6 Oct 2025, Han et al., 2022, Chen et al., 16 Nov 2025, Chen et al., 2023, Moradi et al., 4 Aug 2025). For regular regression tasks (UCI, toy data), CARD and proper-scoring-rule approaches often obtain the lowest RMSE and best calibration; in probabilistic PDEs and weather forecasting, mixture-based noise parameterizations yield calibrated coverage near nominal confidence (Kneissl et al., 6 Oct 2025). For image deblurring, HI-Diff achieves PSNR/SSIM gains and real-world generalization (Chen et al., 2023).

Theoretical results demonstrate minimax-optimality under TV and Wasserstein for conditional regression, with rates depending only on the intrinsic rather than ambient dimension (Tang et al., 30 Sep 2024, Xia et al., 18 Oct 2024), and, for robust losses, formal breakdown point guarantees (Moradi et al., 4 Aug 2025). Empirical ablations confirm that regression–diffusion hybrids, hierarchical priors, and end-to-end fusion outperform split and single-scale variants.

Limitations include higher computational costs versus simple regressors, sensitivity to the choice of noise schedule, and—for symbolic or discrete diffusion—slower sampling and dependence on carefully tuned tokenization and validity constraints (Bastiani et al., 30 May 2025, Tymkow et al., 8 Oct 2025, Han et al., 16 Sep 2025). Most methods focus on aleatoric uncertainty; extensions to epistimic uncertainty or fast discrete denoisers are active research directions (Kneissl et al., 6 Oct 2025, Han et al., 2022).

6. Domain-Specific Instantiations and Applications

Diffusion-based regression models have demonstrated versatility across domains:

Image deblurring: HI-Diff fuses latent-space diffusion priors via multi-scale cross-attention into a hierarchical Restormer, achieving SOTA PSNR/SSIM (Chen et al., 2023).
Speech enhancement: Thunder’s Brownian bridge-based regression–diffusion approach allows both single-step regression and few-step refinement for real-time, unified enhancement (Trachu et al., 10 Jun 2024).
Symbolic regression: Discrete or continuous token diffusion models (DDSR, Symbolic-Diffusion, DiffuSR) generate mathematical expressions in globally parallel fashion, outperforming or matching autoregressive and GP-based search on complexity and accuracy (Tymkow et al., 8 Oct 2025, Bastiani et al., 30 May 2025, Han et al., 16 Sep 2025).
Inverse problems and 1-bit quantization: Plug-and-play diffusion, e.g., Diff-OneBit, achieves fast, high-fidelity reconstructions under nondifferentiable likelihoods via gradient-based surrogate optimization (Chen et al., 16 Nov 2025).
Unsupervised/robust settings: RDDPMs extend diffusion models to contaminated data using Huber and LTS robust regression objectives, yielding improved anomaly segmentation (Moradi et al., 4 Aug 2025).
Bayesian cognitive modeling: Unified Bayesian regression–diffusion models integrate subject- and trial-level effects for cognitive RT/choice modeling, with full propagation of hierarchical uncertainty (Jin et al., 1 Jul 2025).
Manifold learning and semisupervised regression: Diffusion-based spectral algorithms for regression on manifolds achieve convergence rates governed only by intrinsic geometry, leveraging both labeled and unlabeled data via graph Laplacian heat-kernel approximations (Xia et al., 18 Oct 2024).

7. Connections, Theoretical Advances, and Outlook

The regression view of diffusion models provides a rigorous bridge between stochastic generative modeling, nonlinear regression, and information theory (Kong et al., 2023). Recent work has unified continuous and discrete regression objectives, established refined likelihood bounds, and provided theoretical optimality under a wide range of metrics and manifold structures (Kong et al., 2023, Tang et al., 30 Sep 2024, Xia et al., 18 Oct 2024).

These results suggest that diffusion-based regression is a flexible and theoretically grounded paradigm for structured prediction and uncertainty quantification. Challenges remain in accelerating sampling, extending to generalized likelihoods or alternative observation models, and further improving discrete/structured output modeling (Bastiani et al., 30 May 2025, Tymkow et al., 8 Oct 2025, Han et al., 16 Sep 2025). Ongoing work explores accelerated samplers, hybrid architectures, and principled integration of reinforcement learning, manifold priors, and plug-and-play modularity.

Diffusion-based regression is thus positioned as a foundational methodology for next-generation predictive modeling, with rigorous connections to regression, density estimation, and conditional generative learning, encompassing a spectrum from classical MSE regression to high-dimensional, distributionally calibrated, and uncertainty-aware predictive models.