UnfoldLDM: Deep Unfolding for Inverse Problems

Updated 29 November 2025

UnfoldLDM is a deep unfolding framework that fuses diffusion probabilistic models and optimization techniques for solving inverse problems in physics and imaging.
It employs domain-aware conditioning and specialized proximal modules to integrate generative deep learning priors with interpretable, iterative model-based updates.
Experimental evaluations demonstrate improved restoration metrics in imaging and unfolding precision within 3–5% in high-energy physics, highlighting its robust performance.

UnfoldLDM denotes a family of deep unfolding frameworks that unify diffusion probabilistic models, deep unfolding optimization, and specialized proximal modules for high-dimensional scientific and imaging inverse problems. Originating in conditional high-energy physics unfolding (Pazos et al., 3 Jun 2024), posterior Bayesian sampling (Mbakam et al., 3 Jul 2025), and blind image restoration (He et al., 22 Nov 2025), UnfoldLDM approaches aim to combine the interpretability and modularity of model-based iterations with the expressiveness and adaptability of generative deep learning priors, notably utilizing Denoising Diffusion Probabilistic Models (DDPMs), latent diffusion modules, and transformer-based proximal operators.

1. Mathematical and Algorithmic Foundations

UnfoldLDM frameworks are grounded in joint optimization formulations and probabilistic sampling, with core methodology drawn from deep unfolding and score-based diffusion models. A canonical UnfoldLDM formulation involves:

Forward diffusion: Data vectors $x_0$ are iteratively perturbed with scheduled Gaussian noise,

$q(x_t|x_{t-1}) = \mathcal{N}\big(x_t; \sqrt{1-\beta_t}\,x_{t-1}, \beta_t\,I\big),$

with $T$ steps and schedule $\{\beta_1,\ldots,\beta_T\}$ . Marginalization yields

$q(x_t|x_0) = \mathcal{N}\big(x_t; \sqrt{\bar{\alpha}_t}\,x_0, (1-\bar{\alpha}_t)\,I\big),$

where $\alpha_t=1-\beta_t$ , $\bar{\alpha}_t = \prod_{s=1}^t\alpha_s$ .

Reverse inference/modeling: Conditional diffusion reverses the process; for high-energy physics and imaging, this means sampling a posterior $p_\theta(x_0|y)$ with $y$ an observed vector. Learned reverse chains adopt

$p_\theta(x_{0:T}|y) = p(x_T|y)\prod_{t=1}^T p_\theta(x_{t-1}|x_t, y),$

with closed-form Gaussian transitions and a neural-net-parameterized mean shifting function.

Deep unfolding (editor's term): For inverse imaging, the network mimics K iterations of a Markov chain Monte Carlo or proximal-gradient descent on energy

$\min_x \;\frac12\|\mathbf{y}-\mathbf{D}\mathbf{x}\|_2^2 + \frac12 \|\mathbf{y}-\mathbf{W}\mathbf{x}\mathbf{M}\|_2^2 + \lambda\,\phi(x),$

where $\mathbf{D}$ is a (possibly unknown) degradation, factorized into $\mathbf{M}^T\otimes\mathbf{W}$ for granular control. Each block alternates analytic gradient descent, data-driven degradation estimation, and learned proximal updates.

2. Conditioning and Prior Integration Mechanisms

A hallmark of UnfoldLDM frameworks is their use of domain-aware conditioning. In the physics context (Pazos et al., 3 Jun 2024), process-level kinematic moments

$\mu_1=\frac{1}{N}\sum_i p_{T,i},\;\;\mu_k=\frac{1}{N}\sum_i(p_{T,i}-\mu_1)^k\;\;(k=2\ldots6)$

are appended to per-object feature vectors, providing strong inductive bias. In imaging (He et al., 22 Nov 2025), degradation-aware modules (MGDA) estimate the underlying operator and its granularity, yielding intermediate states $(\widehat{x}_k,\widetilde{x}_k)$ along with compact priors $\mathbf{P}_k^h$ via a degradation-resistant latent diffusion model (DR-LDM).

In unfolded MCMC for Bayesian imaging (Mbakam et al., 3 Jul 2025), the framework combines explicit proximity-based data fidelity, stochastic score-based diffusion, and LoRA-adapted backbone priors (e.g., U-Net/ADM) for learned posterior sampling, enabling adaptation to variable forward models at inference.

3. Deep Network Architectures and Training Protocols

UnfoldLDM modules utilize domain-matched network designs:

Conditional DDPM MLP (Pazos et al., 3 Jun 2024): Inputs are concatenated $(x_t, y, e_t)$ vectors, passing through multi-layer perceptrons with GELU activation, dropout (0.01), and skip connections, totaling $\sim 1$ M parameters.
Unfolded, LoRA-adapted consistency networks (Mbakam et al., 3 Jul 2025): A U-Net/ADM backbone enhanced with low-rank adapters in cross-attention layers, trained using a mixture of reconstruction, perceptual (LPIPS), and adversarial losses.
Multi-stage transformer-proximal modules (He et al., 22 Nov 2025): OCFormer integrates multi-scale degradation-resistant attention (DRA) and prior-guided detail recovery (PDR) within a U-shaped transformer. All learnable weights (except priors) are shared across K stages.

Training protocols employ Adam or AdamW variants, batch sizes suited to modality (physics: 2048, imaging: 2-4), explicit step-size learning for gradient modules, and end-to-end loss mixing for robustness (e.g., $\mathcal{L}_\mathrm{Rec}$ , $\mathcal{L}_\mathrm{ISDA}$ , $\mathcal{L}_\mathrm{Diff}$ for restoration tasks).

4. Evaluation Strategies and Task-Specific Performance

Quantitative assessment employs task-specific metrics:

Physics (jet unfolding) (Pazos et al., 3 Jun 2024): Wasserstein-1 distance ( $W_1$ ), binned $\chi^2$ per degree of freedom, and sum absolute relative deviations. Experimental results show $W_\mathrm{gen}\approx W_\mathrm{dedicated} \ll W_\mathrm{detector}$ , achieving unfolding precision within 3–5% budgets across unseen and composite test processes.
Imaging (restoration) (Mbakam et al., 3 Jul 2025, He et al., 22 Nov 2025): Peak Signal-to-Noise Ratio (PSNR, $\uparrow$ ), Structural Similarity Index (SSIM, $\uparrow$ ), LPIPS ( $\downarrow$ ), and Fréchet Inception Distance (FID, $\downarrow$ ). UnfoldLDM achieves PSNR improvements of $0.5\sim1.0$ dB over DUN baselines across SIDD, GoPro, UIEB, BAID, LOL, Rain100L, with competitive FID (3–12) and rapid sampling ( $\leq$ 12 NFEs).

Ablations demonstrate the necessity of deep unfolding (3 blocks outperform 1), LoRA rank selection (best at 5), and the inclusion of MGDA, DR-LDM, DRA, and PDR modules for optimal restoration and generalization. Plug-and-play compatibility as a proximal drop-in is substantiated across six family DUNs.

5. Generalization, Flexibility, and Limitations

A core advantage of UnfoldLDM is process/task generalization: conditioning via distributional moments (physics) or MGDA-estimated operators (imaging) enables unfolded models to handle a variety of priors and forward models without retraining. For physics unfolding, the same network can apply to measured data by recalculating six descriptive moments. In imaging, the unfolded sampler is robust to out-of-distribution noise and operators (e.g., handling 8× SR with a 4× SR model), whereas task-specific conditional DMs degrade.

Limitations persist: physics UnfoldLDM currently applies only to object-wise jets, lacking event-level coupling and systematic uncertainty propagation; imaging UnfoldLDMs show performance drops when trained jointly on multiple highly dissimilar tasks, suggesting further architectural research into universal samplers.

6. Architectural Innovations and Future Outlook

Emergent UnfoldLDM trends point to several opportunities:

Integration of expressive architectures: U-Net, cross-attention, FiLM, and transformer-based proximal updates are projected to enhance conditional fusion and performance on both per-object and global conditioning.
Dynamic and learned noise schedules: Adaptive $\beta_t$ can accelerate inference and improve fidelity.
Event- and set-level enhancement: Transitioning to graph-based or permutation-invariant architectures could enable multi-object and event-level inference, relevant for high-energy physics and complex imaging scenarios.
Direct deployment and modularity: A single generalized UnfoldLDM model is computationally efficient (e.g., 3 h training, 3 min per $10^6$ objects on contemporary GPUs) and supports modular integration into analysis chains, leveraging per-input posterior samples for downstream multi-dimensional analyses.

A plausible implication is that the continued synthesis of deep unfolding, generative modeling, and domain-informed conditioning will enable more accurate, flexible, and interpretable solutions in high-dimensional inverse problems, especially as architectures evolve to manage broader sets of prior distributions and more complex data structures.