Unrolled denoising networks provably learn optimal Bayesian inference (2409.12947v1)

Published 19 Sep 2024 in cs.LG, cs.DS, and stat.ML

Abstract: Much of Bayesian inference centers around the design of estimators for inverse problems which are optimal assuming the data comes from a known prior. But what do these optimality guarantees mean if the prior is unknown? In recent years, algorithm unrolling has emerged as deep learning's answer to this age-old question: design a neural network whose layers can in principle simulate iterations of inference algorithms and train on data generated by the unknown prior. Despite its empirical success, however, it has remained unclear whether this method can provably recover the performance of its optimal, prior-aware counterparts. In this work, we prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP). For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network approximately converge to the same denoisers used in Bayes AMP. We also provide extensive numerical experiments for compressed sensing and rank-one matrix estimation demonstrating the advantages of our unrolled architecture - in addition to being able to obliviously adapt to general priors, it exhibits improvements over Bayes AMP in more general settings of low dimensions, non-Gaussian designs, and non-product priors.

Summary

The paper proves that unrolled AMP networks can match Bayes-optimal performance in compressed sensing under product priors.
It leverages state evolution and NTK theory to analyze the layerwise learning of optimal denoising functions.
Empirical tests on compressed sensing and rank-one matrix estimation confirm robust performance even in non-Gaussian and finite-dimensional settings.

Unrolled Denoising Networks Provably Learn Optimal Bayesian Inference

Introduction

The paper "Unrolled denoising networks provably learn optimal Bayesian inference" presents significant theoretical advancements in understanding and leveraging algorithm unrolling for improving Bayesian inference in high-dimensional settings. Bayesian inference serves as a robust framework for designing estimators for inverse problems, primarily premised on the prior knowledge of data. However, in practical scenarios, the true prior is often unknown, which poses a challenge in achieving optimal performance. Algorithm unrolling emerges as a practical solution by incorporating iterative inference algorithm steps into neural network layers that can adapt based on observed data. Despite empirical successes, the provable guarantees of such networks have remained elusive. This work addresses this gap by providing the first rigorous proof that unrolled networks based on approximate message passing (AMP) can recover the performance of their optimal, prior-aware counterparts.

Key Contributions

Theoretical Guarantees:
- For compressed sensing, when data is drawn from a product prior, the paper proves that an unrolled neural network approximates the performance of Bayes AMP.
- The proof hinges on state evolution, a key theoretical tool in AMP analysis, combined with neural tangent kernel (NTK) theory to analyze training dynamics.
Empirical Validation:
- Extensive experiments on compressed sensing and rank-one matrix estimation tasks validate the theoretical findings.
- The unrolled networks exhibit robust performance across various conditions, including low dimensions, non-Gaussian designs, and non-product priors.
Learning Dynamics:
- The study breaks down the layerwise learning process of network parameters, providing insights into how gradient descent can efficiently learn optimal denoising functions given high-dimensional inputs.

Main Theoretical Result

The paper's central theoretical claim is encapsulated in the following theorem, which states that in the context of compressed sensing with a Gaussian sensing matrix, if the prior on the signal is a product distribution with smooth, sub-Gaussian marginals, an unrolled AMP network trained with gradient descent on polynomially many samples will achieve the same mean squared error (MSE) as Bayes AMP in infinite dimensions. The proof leverages state evolution and NTK techniques to demonstrate polynomial convergence rates in achieving near-optimal denoising performance.

Architecture and Training

The unrolled networks, referred to as Learn Denoising Networks (LDNets), mimic the iterative updates of AMP. Each layer applies a learned denoiser, parametrized as an MLP, to refine signal estimates. The paper also discusses the importance of estimating state evolution parameters dynamically to stabilize training. Furthermore, the training is performed layerwise to mitigate the risk of suboptimal convergence traps.

Experimental Highlights

Compressed Sensing:
- LDNet performs on par with Bayes AMP for both Bernoulli-Gaussian and $\mathbb{Z}_2$ priors.
- Empirical results indicate that learned MLP denoisers approximate the theoretically optimal Bayes denoisers.
Rank-One Matrix Estimation:
- The network successfully generalizes to this context, achieving comparable performance to Bayes AMP.
- Layerwise denoising function approximations reflect adherence to Bayes optimality.
Robustness to Non-Ideal Conditions:
- LDNet surpasses Bayes AMP in non-asymptotic regimes and when the sensing matrix deviates from Gaussian distributions.
- Introducing trainable auxiliary parameters (e.g., the matrix $B$ ) further improves robustness and performance under finite dimensions and non-Gaussian settings.

Future Directions

Unrolling denoising functions offer a potent approach to tackling practical Bayesian inference problems, but several future directions remain:

Non-Product Priors: The extension of theoretical guarantees to non-product priors remains an open question. Future work could explore how score estimation and NTK theory might be adapted to more complex prior structures.
Generative Models: Investigating the potential integrations of unrolled networks with modern generative models such as diffusion models could enhance the scalability to high-dimensional priors.
Practical Implementations: Continued exploration and validation in real-world applications, including image and signal processing tasks, could solidify the approach's practical utility.

In conclusion, this work provides a crucial link between theory and practice in Bayesian inference using neural networks, demonstrating that unrolled networks can achieve empirically and provably optimal performance even when the true prior is unknown. This bridges a significant gap and opens up new avenues for leveraging deep learning in high-dimensional statistical problems.