Data-driven approaches to inverse problems (2506.11732v1)

Published 13 Jun 2025 in math.NA, cs.LG, cs.NA, and math.OC

Abstract: Inverse problems are concerned with the reconstruction of unknown physical quantities using indirect measurements and are fundamental across diverse fields such as medical imaging, remote sensing, and material sciences. These problems serve as critical tools for visualizing internal structures beyond what is visible to the naked eye, enabling quantification, diagnosis, prediction, and discovery. However, most inverse problems are ill-posed, necessitating robust mathematical treatment to yield meaningful solutions. While classical approaches provide mathematically rigorous and computationally stable solutions, they are constrained by the ability to accurately model solution properties and implement them efficiently. A more recent paradigm considers deriving solutions to inverse problems in a data-driven manner. Instead of relying on classical mathematical modeling, this approach utilizes highly over-parameterized models, typically deep neural networks, which are adapted to specific inverse problems using carefully selected training data. Current approaches that follow this new paradigm distinguish themselves through solution accuracy paired with computational efficiency that was previously inconceivable. These notes offer an introduction to this data-driven paradigm for inverse problems. The first part of these notes will provide an introduction to inverse problems, discuss classical solution strategies, and present some applications. The second part will delve into modern data-driven approaches, with a particular focus on adversarial regularization and provably convergent linear plug-and-play denoisers. Throughout the presentation of these methodologies, their theoretical properties will be discussed, and numerical examples will be provided. The lecture series will conclude with a discussion of open problems and future perspectives in the field.

Summary

The paper presents a novel integration of classical regularization with deep learning to address the challenges of ill-posed inverse problems.
It details the use of variational models, proximal algorithms, and learned iterative schemes to improve reconstruction accuracy and stability.
The work demonstrates practical applications in medical imaging, remote sensing, and material sciences while outlining key theoretical challenges.

This document, "Data-driven approaches to inverse problems" (2506.11732), provides an introduction to the field of inverse problems, starting with classical mathematical approaches and progressing to modern data-driven techniques, particularly those leveraging deep learning. It emphasizes the practical application of these methods in areas like medical imaging, remote sensing, and material sciences.

Chapter 1: Introduction to Inverse Problems

The first chapter introduces inverse problems as the task of reconstructing unknown physical quantities from indirect measurements. Examples include:

Bio-Medical Imaging: Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET). These often involve reconstructing internal body images from external sensor data. Downstream tasks like mitosis analysis, cell dynamics estimation, and tumor segmentation are also highlighted.
General Image Processing: Tasks like tree monitoring with LiDAR, landcover analysis, traffic analysis, multi-modal image fusion, and virtual art restoration, all rely on solving inverse problems.
Physical Sciences: Applications in astrophysics (e.g., black hole imaging), geophysics (seismic tomography), material sciences (microstructure analysis), and computational fluid dynamics (flow parameter estimation).

A core concept discussed is ill-posedness. An inverse problem $y = Au$ (where $y$ is observed data, $u$ is the unknown, and $A$ is the forward operator) is ill-posed if it violates one or more of Hadamard's conditions for well-posedness:

Existence: A solution exists.
Uniqueness: The solution is unique.
Stability: The solution depends continuously on the data.

Most practical inverse problems are ill-posed, often due to noise, undersampling, or the nature of the forward operator. For example, image deblurring (equivalent to inverting the heat equation) suffers from instability, where high-frequency noise is amplified. Computed Tomography, involving the inversion of the Radon transform, is also ill-posed because its singular values tend to zero, making the inverse unbounded.

To overcome ill-posedness, regularization is introduced. This involves formulating a related well-posed problem whose solution approximates the true solution. Variational Regularization is a prominent approach, recasting the inverse problem as an optimization task:

$\min _{u \in U} \left\|A u-y_n\right\|^2+\alpha R(u)$

Here, $\left\|A u-y_n\right\|^2$ is the data fidelity term, $R(u)$ is the regularization term (or prior), and $\alpha > 0$ is the regularization parameter. For strongly convex regularizers, this formulation guarantees existence, uniqueness, and stability of the solution, and under appropriate choice of $\alpha$ , convergence to the true solution as noise decreases (Theorem 1).

The Bayesian perspective offers another way to tackle ill-posedness, interpreting the problem as maximum a-posteriori (MAP) estimation. The unknown $u$ , measurement $y$ , and noise $n$ are treated as random variables. Bayes' theorem $p(u \mid y) = \frac{p(y \mid u) p(u)}{p(y)}$ is used to find the posterior distribution. The MAP estimate maximizes this posterior. For Gaussian noise $n \sim \mathcal{N}(0, \sigma^2I_m)$ and a prior $p(u) \propto e^{-\mathcal{R}(u)}$ , the MAP estimate is found by solving:

$\min _u\left\{\mathcal{R}(u)+\frac{1}{2 \sigma^2} \left\|y-Au\right\|^2\right\}$

This connects variational regularization to the statistical properties of noise, with $\alpha = 2\sigma^2$ .

Chapter 2: Variational Models and PDEs for Inverse Imaging

This chapter focuses on variational models and their connection to Partial Differential Equations (PDEs) for imaging applications. The general variational problem is:

$\min_u \left\{ \alpha \mathcal{R}(u) + \mathcal{D}(Au, y) \right\}$

The data fidelity term $\mathcal{D}(Au, y)$ depends on the noise model (e.g., $L_2$ norm for Gaussian noise, Kullback-Leibler divergence for Poisson noise). The choice of regularizer $\mathcal{R}(u)$ is crucial. Classical Tikhonov regularization (e.g., $\mathcal{R}(u) = \frac{1}{2} \int_{\Omega} |\nabla u|^2 dx$ ) promotes smoothness but blurs edges, as it implies Hölder continuity ( $C^{1/2}$ ) for the solution.

Total Variation (TV) regularization is introduced as a more suitable prior for images as it preserves edges. $\mathcal{R}(u) = |Du|(\Omega)$ is the total variation of $u$ . Functions of Bounded Variation ( $BV(\Omega)$ ) can have jump discontinuities. TV regularization penalizes oscillations while respecting edges. For $u \in W^{1,1}(\Omega)$ , $|Du|(\Omega) = \|\nabla u\|_{L^1(\Omega)}$ , promoting sparse gradients. This is key in compressed sensing, enabling reconstruction from undersampled data, as in MRI where $y=(\mathcal{F} u)_{\mid \Lambda}+n$ . The problem becomes:

$\min _u \quad \alpha\|\nabla u\|_1+\frac{1}{2}\left\|(\mathcal{F} u)_{\mid \Lambda}-y\right\|^2$

TV is also related to the perimeter of level sets via the co-area formula: $|Du|(\Omega) = \int_{-\infty}^{+\infty} \operatorname{Per}(\{u>s\} ; \Omega) ds$ . This makes it useful for segmentation, as in the Chan--Vese model:

$\min _{\chi, c_1, c_2} \quad\alpha|D \chi|(\Omega)+\int_{\Omega}\left(y-c_1\right)^2 \chi+\int_{\Omega}\left(y-c_2\right)^2(1-\chi)$

A convex relaxation allows $v \in [0,1]$ instead of binary $\chi$ .

The connection to PDEs is shown through gradient flows. For the Rudin--Osher--Fatemi (ROF) model:

$\min_u \left( \alpha |Du|(\Omega) + \frac{1}{2} \|u - y\|^2 \right)$

The gradient flow is $u_t = \alpha \operatorname{div}\left(\frac{D u}{|D u|}\right) + (u-y)$ , a nonlinear diffusion equation that smooths flat regions more than edges.

Numerical Aspects for solving $\min_{u\in X} \mathcal{J}(u)+\mathcal{H}(u)$ are discussed. Key concepts:

Subdifferential: $\partial J(x):=\left\{p \in X^{\prime}:\langle p, y-x\rangle+J(x) \leq J(y) \quad \forall y \in X\right\}$ . For $\ell_1$ norm, $\partial|\cdot|(z)=\{\operatorname{sign}(z)\}$ if $z \neq 0$ , and $\partial|\cdot|(0)=[-1,1]$ .
Legendre-Fenchel transform (convex conjugate): $f^*\left(y\right):=\sup_{x \in X} \left\{\left\langle y, x\right\rangle-f(x)\right\}$ .
Proximal Map: $\operatorname{prox}_{\tau J}(y) = \underset{u\in X}{\operatorname{argmin}}\left( J(u)+\frac{1}{2 \tau}\|u-y\|_2^2 \right)$ . Moreau's identity links $\operatorname{prox}_{\tau J}$ and $\operatorname{prox}_{\frac{1}{\tau} J^*}$ .

The dual problem for $\min _{u \in X} \mathcal{J}(A u)+\mathcal{H}(u)$ is $\max _p-\mathcal{J}^*(p)-\mathcal{H}^*\left(-A^* p\right)$ . For ROF, the dual is a constrained least-squares problem: $\min _p\left\{\frac{1}{2}\left\|\nabla^* p-y\right\|_2^2 : \left\|p_{i, j}\right\|_2 \leq \alpha \text{ for all } i, j\right\}$ .

Optimization algorithms:

Proximal descent: $u^{k+1} = \operatorname{prox}_{\tau J}(u^k)$ .
Forward-Backward Splitting (for $\min \mathcal{J}(u)+\mathcal{H}(u)$ where $\mathcal{H}$ is smooth): $u^{k+1} = \operatorname{prox}_{\tau \mathcal{J}}(u^k - \tau \nabla \mathcal{H}(u^k))$ .
Primal-Dual Hybrid Gradient (PDHG) (for $\min \mathcal{J}(A u)+\mathcal{H}(u)$ ):

$\begin{aligned} u^{k+1} & =\operatorname{prox}_{\tau \mathcal{H}}(u^k-\tau A^* p^k) \ p^{k+1} & =\operatorname{prox}_{\sigma \mathcal{J}^*}(p^k+\sigma A\left(2 u^{k+1}-u^k\right)) \end{aligned}$

The "Regularizer Zoo" section lists various handcrafted regularizers (wavelet-based, higher-order TV, non-local, anisotropic). The chapter concludes by noting the limitations of knowledge-driven methods in modeling complex image structures, motivating the shift to data-driven approaches.

Chapter 3: Data-Driven Approaches to Inverse Problems

This chapter contrasts knowledge-driven models with data-driven models, which learn from large datasets using over-parameterized models like neural networks. Early data-driven methods include sparse coding/dictionary learning, black-box denoisers (Plug-and-Play, Regularization by Denoising), and bilevel optimization.

Deep learning models, while powerful, are often "black boxes" due to their complexity, leading to challenges in interpretability, safety, and systematic design. A neural network is a mapping $\Psi(x, \theta)$ composed of layers $z^{k+1} =f^k(z^k, \theta^k)$ , commonly $f^k(z)=\sigma(W^k z+b^k)$ . Training minimizes a loss function over data.

Learned Iterative Reconstruction Schemes unroll optimization algorithms, replacing or augmenting steps with neural networks. General form: $u^{k+1}=\Lambda_{\theta_k}(u^k, A^*(A u^k-y))$ . Examples:

Learned Gradient: $\Lambda_\theta(u, h) := u+\Gamma_\theta(h)$
Variational Networks: $\Lambda_\theta(u, h) := u-h+\Gamma_\theta(u)$
Learned Proximal/Plug-and-Play: $\Lambda_\theta(u, h) := \Gamma_\theta(u-h)$ Learned Primal-Dual (LPD) schemes generalize this further. Limitations: Lack of theoretical understanding, interpretability issues, data consistency not guaranteed, need for supervised data, convergence issues beyond trained iterations, computational cost.

Deep Equilibrium Networks (DEQs) formulate the reconstruction as a fixed point $u=\Gamma_{\Theta}(u ; y)$ . If $\Gamma_{\Theta}$ is a contraction (e.g., by ensuring $R_{\Theta} - \operatorname{Id}$ is $\epsilon$ -Lipschitz with $\epsilon < 1+\mu$ for a gradient-descent like scheme $\Gamma_{\Theta}(u ; y)=u+\eta A^*(y-A u)-\eta {R}_{\Theta}(u)$ ), iterates converge.

Learned Variational Models aim to learn the regularizer $R_\theta(u)$ within the variational framework $\min_{u}\|A u-y\|_2^2+\alpha R_\theta(u)$ , retaining theoretical benefits. Adversarial Regularization trains $R_\Theta$ to distinguish "good" images (from distribution $\mathbb{P}_U$ ) from "bad" images (from $\mathbb{P}_n$ ). Inspired by Wasserstein GANs, the loss is:

$\min_ {\Theta}\; \mathbb{E}_{U \sim \mathbb{P}_U}\left[\Psi_{\Theta}(U)\right]-\mathbb{E}_{U \sim \mathbb{P}_n}\left[\Psi_{\Theta}(U)\right]+\mu \cdot \mathbb{E}\left[\left(\left\|\nabla_u \Psi_{\Theta}(U)\right\|-1\right)_{+}^2\right]$

where $R_{\Theta}(u)=\Psi_{\Theta}(u)+\rho_0\|u\|_2^2$ . $\Psi_{\Theta}$ can be a (weakly) convex CNN. Theoretical analysis shows this training helps find regularizers that effectively capture data statistics. Under Data Manifold and Low Noise Assumptions, the distance function to the data manifold is a maximizer of the Wasserstein loss component.

Plug-and-Play (PnP) Methods use pre-existing denoisers $D$ within iterative schemes like ADMM. The update $v^{k+1} = \operatorname{prox}_{\tau \frac{\alpha}{\lambda} R}(u^{k+1}+h^k)$ is replaced by $v^{k+1} = D(u^{k+1}+h^k)$ . Theoretical challenges: Convergence is not generally guaranteed, and the implicit regularizer is often unknown.

Regularization by Denoising (RED) defines $R_{RED}(u) = \frac{1}{2} \langle u, u - D(u) \rangle$ . Requires symmetric Jacobian of $D(u)$ .
Gradient-step (GS) denoisers model $D_{\theta}(u) = u - \nabla R_{\theta}(u)$ , providing an explicit regularizer. If $R_{\theta}(u) = \frac{1}{2}\|u - \Psi_{\theta}(u)\|_2^2$ , then $D_\theta(x)=\operatorname{prox}_{\phi_\theta}(x)$ for some $\phi_\theta$ , leading to better convergence properties [hurault2022proximal, tan2024provably].

Linear Denoiser Plug and Play: For a linear denoiser $D_\sigma = \operatorname{prox}_J$ where $J(x)=\frac{1}{2}\langle x,(D_\sigma^{-1}-\operatorname{Id}) x\rangle$ , the regularization strength $\tau$ can be controlled by applying a spectral filter $g_\tau(\lambda)=\lambda /(\tau-\lambda(\tau-1))$ to the eigenvalues of $D_\sigma$ , yielding $g_\tau(D_\sigma) = \operatorname{prox}_{\tau J}$ . This allows for provably convergent regularization under certain conditions on $g_\tau$ [hauptmann2024convergent].

The chapter concludes with an outlook on developing new frameworks fundamentally designed for deep learning, focusing on rethinking optimization, incorporating inductive biases, and finding a balance between model capacity and theoretical guarantees.

Chapter 4: Perspectives

This chapter discusses broader implications and future directions. Task Adaptation: Inverse problems are often part of larger pipelines (e.g., reconstruction $\rightarrow$ segmentation $\rightarrow$ classification). Task-adapted reconstruction aims to optimize the reconstruction for a specific downstream task. Deep learning facilitates this by allowing joint training of multiple components. For example, joint reconstruction-segmentation with a loss $(1-C) \ell_X + C \ell_D$ can improve segmentation performance compared to training sequentially or only for segmentation [adler2022task].

The Data Driven - Knowledge Informed Paradigm: The future lies in integrating deep learning's power with mathematical rigor. This requires:

Guarantees via Structured Learning: Imposing properties (stability, robustness) on network architectures.
More Useful Theoretical Tools: Analysis must account for data dependency (dataset size, diversity, bias) and its influence on generalization and robustness.
Explainability: Understanding model decisions is crucial, especially for complex, end-to-end systems in safety-critical applications.

The ultimate goal is to leverage these combined approaches for transformative applications, such as making advanced medical imaging like CT/MRI more accessible and effective as clinical screening tools.

Overall, the paper provides a comprehensive overview of how data-driven methods, especially deep learning, are revolutionizing the field of inverse problems. It highlights the transition from classical, model-based techniques to learning-based approaches, discusses various methodologies like learned iterative schemes, learned regularizers, and plug-and-play methods, and explores their practical applications, implementation considerations, and theoretical underpinnings. The paper also emphasizes the ongoing need to bridge the gap between empirical success and rigorous theoretical guarantees, advocating for a synergistic "data-driven, knowledge-informed" paradigm.This paper, "Data-driven approaches to inverse problems" (2506.11732), serves as an introductory text to the evolving landscape of solving inverse problems, transitioning from classical mathematical foundations to modern data-driven techniques, with a strong emphasis on deep learning. It details how these approaches are practically implemented across various scientific and engineering disciplines like medical imaging, remote sensing, and material sciences.

The document is structured into four main chapters:

Chapter 1: Introduction to Inverse Problems

This chapter lays the groundwork by defining inverse problems—inferring unknown quantities from indirect measurements ( $y = Au$ ). It showcases a wide array of applications:

Bio-Medical Imaging: Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), which involve creating images of internal body structures. Further applications include mitosis analysis and tumor segmentation.
General Image Processing: Techniques like LiDAR-based tree monitoring, landcover analysis, traffic flow analysis, multi-modal image fusion, and virtual art restoration.
Physical Sciences: Applications in astrophysics (e.g., black hole imaging [akiyama2019first]), geophysics (seismic tomography [biegler2011large, haber2014computational]), material sciences [tovey2019directional], and computational fluid dynamics [benning2014phase].

A central theme is ill-posedness, where problems violate Hadamard's conditions for well-posedness (existence, uniqueness, stability of solution). This often arises from noise or incomplete data. For instance, image deblurring, analogous to solving the heat equation backward, suffers from instability where noise in high frequencies is amplified. CT reconstruction, which inverts the Radon transform, is ill-posed due to the unbounded nature of the inverse operator [hertle1981problem].

Regularization is presented as the primary strategy to combat ill-posedness. Instead of solving the ill-posed problem directly, a related well-posed problem is formulated. Variational Regularization is a key technique, recasting the problem as minimizing an energy functional:

$\min _{u \in U} \left\|A u-y_n\right\|^2+\alpha R(u)$

Here, $\left\|A u-y_n\right\|^2$ is the data fidelity term, $R(u)$ is the regularization term imposing prior knowledge, and $\alpha > 0$ is a parameter balancing them. This approach ensures existence, uniqueness, and stability (Theorem 1 in the paper, drawing from Tikhonov [Tikh1963] and Phillips [phillips1962]).

The Bayesian perspective offers an alternative, framing the inverse problem as a maximum a-posteriori (MAP) estimation task. If noise is Gaussian $n \sim \mathcal{N}(0, \sigma^2I_m)$ and the prior on $u$ is $p(u) \propto e^{-\mathcal{R}(u)}$ , the MAP estimate solves:

$\min _u\left\{\mathcal{R}(u)+\frac{1}{2 \sigma^2} \left\|y-Au\right\|^2\right\}$

This establishes a link where the regularization parameter $\alpha$ is related to the noise variance $\sigma^2$ .

Chapter 2: Variational Models and PDEs for Inverse Imaging

This chapter explores variational models for imaging, where the goal is to solve:

$\min_u \left\{ \alpha \mathcal{R}(u) + \mathcal{D}(Au, y) \right\}$

The fidelity term $\mathcal{D}$ depends on the noise characteristics (e.g., $L_2$ norm for Gaussian noise, KL divergence for Poisson noise). The choice of regularizer $\mathcal{R}(u)$ is critical. While Tikhonov regularization ( $\mathcal{R}(u) = \frac{1}{2} \int_{\Omega} |\nabla u|^2 dx$ ) promotes smoothness, it tends to blur sharp edges.

Total Variation (TV) regularization ( $\mathcal{R}(u) = |Du|(\Omega)$ ) is highlighted for its ability to preserve edges, as functions of Bounded Variation ( $BV(\Omega)$ ) can possess discontinuities [rudin1992nonlinear]. In compressed sensing and MRI, TV promotes sparsity in the gradient domain ( $|Du|(\Omega) = \|\nabla u\|_{L^1(\Omega)}$ ), enabling reconstruction from undersampled measurements [candes2006stable, lustig2008compressed]. The Chan--Vese model for segmentation leverages TV by minimizing region perimeters [chan2001active].

The connection to Partial Differential Equations (PDEs) is shown via gradient flows. The ROF model's gradient flow is a non-linear diffusion equation $u_t = \alpha \operatorname{div}\left(\frac{D u}{|D u|}\right) + (u-y)$ , which smooths flat areas more than edges.

Numerical aspects for solving $\min_{u\in X} \mathcal{J}(u)+\mathcal{H}(u)$ are then discussed, introducing:

Subdifferential: Generalizes gradients for non-smooth functions.
Legendre-Fenchel transform: Defines the convex conjugate $f^*$ .
Proximal Map: $\operatorname{prox}_{\tau J}(y) = \underset{u}{\operatorname{argmin}}( J(u)+\frac{1}{2 \tau}\|u-y\|_2^2 )$ . The dual problem formulation can simplify optimization. For ROF, it becomes a constrained least-squares problem.

Key optimization algorithms covered:

Proximal descent: $u^{k+1} = \operatorname{prox}_{\tau J}(u^k)$ .
Forward-Backward Splitting: $u^{k+1} = \operatorname{prox}_{\tau \mathcal{J}}(u^k - \tau \nabla \mathcal{H}(u^k))$ for $\min \mathcal{J}(u)+\mathcal{H}(u)$ where $\mathcal{H}$ is smooth.

Primal-Dual Hybrid Gradient (PDHG): An iterative method for

\min \mathcal{J}(A u)+\mathcal{H}(u)

updating primal and dual variables.

// Pseudocode for PDHG
Initialize u_0, p_0
for k = 0, 1, ... do
    u_{k+1} = prox_{tau*H}(u_k - tau * A_transpose * p_k)
    p_{k+1} = prox_{sigma*J_conjugate}(p_k + sigma * A * (2*u_{k+1} - u_k))
end for

The chapter concludes with a "Regularizer Zoo" (listing various handcrafted regularizers like wavelet-based, higher-order TV) and notes the limitations of these knowledge-driven models, paving the way for data-driven methods.

Chapter 3: Data-Driven Approaches to Inverse Problems

This chapter marks the shift to data-driven paradigms. It contrasts knowledge-driven models with data-driven ones that learn from large datasets, often using over-parameterized neural networks. Early examples include sparse coding [elad2006image, aharon2006k], black-box denoisers (like Plug-and-Play [venkatakrishnan2013plug]), and bilevel optimization [kunisch2013bilevel].

Deep learning models, while powerful, present "black box" challenges: lack of interpretability, safety concerns in critical applications, and difficulties in systematic design. Learned Iterative Reconstruction Schemes "unroll" classical optimization algorithms, replacing or parameterizing certain steps with neural networks [gregor2010learning, adler2017solving, hammernik2018learning]. For instance, a general step might be $u^{k+1}=\Lambda_{\theta_k}(u^k, A^*(A u^k-y))$ . Variations include Variational Networks [kobler2017variational] and Learned Primal-Dual (LPD) [adler2018learned]. Limitations include:

Weak theoretical understanding of convergence and stability.
Interpretability of learned components.
Requirement for large supervised datasets.
Potential divergence if iterated beyond trained steps.

Deep Equilibrium Networks (DEQs) address some convergence issues by finding a fixed point $u=\Gamma_{\Theta}(u ; y)$ [gilton2021deep]. Convergence is guaranteed if $\Gamma_{\Theta}$ is a contraction.

Learned Variational Models aim to learn the regularizer $R_\theta(u)$ within the classical variational framework $\min_{u}\|A u-y\|_2^2+\alpha R_\theta(u)$ , seeking to combine the flexibility of deep learning with the stability of variational methods. Adversarial Regularization is a specific method to learn $R_\theta$ [lunz2018adversarial]. Inspired by Wasserstein GANs [arjovsky2017wasserstein], it trains $R_\Theta$ (e.g., a CNN) to differentiate between "good" (clean) and "bad" (noisy/artifact-laden) image distributions using a loss function that approximates the Wasserstein-1 distance. The regularizer is $R_{\Theta}(u)=\Psi_{\Theta}(u)+\rho_0\|u\|_2^2$ , and training involves:

$\min_ {\Theta}\; \mathbb{E}_{U \sim \mathbb{P}_U}\left[\Psi_{\Theta}(U)\right]-\mathbb{E}_{U \sim \mathbb{P}_n}\left[\Psi_{\Theta}(U)\right]+\mu \cdot \mathbb{E}\left[\left(\left\|\nabla_u \Psi_{\Theta}(U)\right\|-1\right)_{+}^2\right]$

This approach can yield regularizers that are convex or weakly convex, enabling theoretical guarantees [mukherjee2024data, shumaylov2024weakly].

Plug-and-Play (PnP) Methods integrate pre-trained denoisers $D$ (which can be deep neural networks) into iterative optimization frameworks like ADMM. The regularization step (a proximal map) is replaced by the denoiser: $v^{k+1} = D(\text{input})$ . Theoretical properties are challenging:

Convergence often requires strong assumptions on the denoiser.
The implicit regularizer is typically unknown. Regularization by Denoising (RED) defines an explicit regularizer $R_{RED}(u) = \frac{1}{2} \langle u, u - D(u) \rangle$ , but requires the denoiser's Jacobian to be symmetric [romano2017little]. Gradient-step (GS) denoisers model $D_{\theta}(u) = u - \nabla R_{\theta}(u)$ , directly providing $R_\theta$ . If $R_{\theta}(u) = \frac{1}{2}\|u - \Psi_{\theta}(u)\|_2^2$ , $D_\theta$ becomes a proximal operator, leading to better convergence analyses [hurault2022proximal, tan2024provably].

For Linear Denoiser Plug and Play, if $D_\sigma = \operatorname{prox}_J$ is linear, its underlying quadratic functional $J(x)=\frac{1}{2}\langle x,(D_\sigma^{-1}-\operatorname{Id}) x\rangle$ can be scaled. Applying a spectral filter $g_\tau(\lambda)=\lambda /(\tau-\lambda(\tau-1))$ to $D_\sigma$ 's eigenvalues effectively changes $D_\sigma$ to $\operatorname{prox}_{\tau J}$ . This allows for provably convergent regularization by adjusting $\tau$ based on data noise [hauptmann2024convergent].

The chapter concludes by looking towards new frameworks designed inherently for deep learning, balancing model capacity with theoretical guarantees.

Chapter 4: Perspectives

This final chapter reflects on broader implications. Task Adaptation: Inverse problems are often intermediate steps. Task-adapted reconstruction optimizes the reconstruction for a specific downstream task (e.g., segmentation, classification) [adler2022task]. Deep learning facilitates this via end-to-end training of combined modules, for example, by minimizing a joint loss like $(1-C) \ell_{\text{reconstruction}} + C \ell_{\text{segmentation}}$ .

The Data-Driven - Knowledge-Informed Paradigm: The future lies in integrating deep learning's empirical power with mathematical rigor. This involves:

Structured Learning: Designing networks with inherent properties (stability, equivariance [celledoni2021equivariant]) for guarantees.
Data-Centric Theory: Theoretical analysis must consider the impact of training data characteristics on model performance and reliability.
Explainability: Methods to understand the decision-making of complex models are crucial, especially in safety-critical domains.

The overarching vision is that this synergy will transform fields like medical imaging, making advanced diagnostic tools more robust, accessible, and clinically impactful.

In summary, the paper traces the evolution of solving inverse problems from classical, model-driven regularization to sophisticated data-driven deep learning strategies. It details various methods, discusses their implementation and real-world applications (especially in imaging), and explores the theoretical underpinnings and current research challenges, particularly the quest for provable guarantees and interpretability in learned models.

PDF Markdown

Related Papers

Adversarial Regularizers in Inverse Problems (2018)
Solving ill-posed inverse problems using iterative deep neural networks (2017)
Machine Learning: a Lecture Note (2025)
Optimal Transport for Machine Learners (2025)
Probably Approximately Correct Labels (2025)

Tweets

https://twitter.com/DynamicsSIAM/status/1934438390868709400