A New Perspective On Denoising Based On Optimal Transport (2312.08135v2)
Abstract: In the standard formulation of the denoising problem, one is given a probabilistic model relating a latent variable $\Theta \in \Omega \subset \mathbb{R}m \; (m\ge 1)$ and an observation $Z \in \mathbb{R}d$ according to: $Z \mid \Theta \sim p(\cdot\mid \Theta)$ and $\Theta \sim G*$, and the goal is to construct a map to recover the latent variable from the observation. The posterior mean, a natural candidate for estimating $\Theta$ from $Z$, attains the minimum Bayes risk (under the squared error loss) but at the expense of over-shrinking the $Z$, and in general may fail to capture the geometric features of the prior distribution $G*$ (e.g., low dimensionality, discreteness, sparsity, etc.). To rectify these drawbacks, we take a new perspective on this denoising problem that is inspired by optimal transport (OT) theory and use it to study a different, OT-based, denoiser at the population level setting. We rigorously prove that, under general assumptions on the model, this OT-based denoiser is mathematically well-defined and unique, and is closely connected to the solution to a Monge OT problem. We then prove that, under appropriate identifiability assumptions on the model, the OT-based denoiser can be recovered solely from information of the marginal distribution of $Z$ and the posterior mean of the model, after solving a linear relaxation problem over a suitable space of couplings that is reminiscent of standard multimarginal OT problems. In particular, thanks to Tweedie's formula, when the likelihood model ${ p(\cdot \mid \theta) }_{\theta \in \Omega}$ is an exponential family of distributions, the OT based-denoiser can be recovered solely from the marginal distribution of $Z$. In general, our family of OT-like relaxations is of interest in its own right and for the denoising problem suggests alternative numerical methods inspired by the rich literature on computational OT.
- Barycenters in the wasserstein space. SIAM Journal on Mathematical Analysis 43(2), 904–924.
- Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media.
- Andrews, D. F. and C. L. Mallows (1974). Scale mixtures of normal distributions. J. Roy. Statist. Soc. Ser. B 36, 99–102.
- Latent variable models and factor analysis (Third ed.). Wiley Series in Probability and Statistics. John Wiley & Sons, Ltd., Chichester. A unified approach.
- Böhning, D. (1999). Computer-assisted analysis of mixtures and applications, Volume 81 of Monographs on Statistics and Applied Probability. Chapman & Hall/CRC, Boca Raton, FL. Meta-analysis, disease mapping and others.
- Optimal rates of convergence for deconvolving a density. J. Amer. Statist. Assoc. 83(404), 1184–1186.
- Rates of estimation of optimal transport maps using plug-in estimators via barycentric projections. Advances in Neural Information Processing Systems 34, 29736–29753.
- Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39(1), 1–38. With discussion.
- Dyson, F. (1926). A method for correcting series of parallax observations. Monthly Notices of the Royal Astronomical Society 86, 686.
- Efron, B. (2011a). Tweedie’s formula and selection bias. J. Amer. Statist. Assoc. 106(496), 1602–1614.
- Efron, B. (2011b). Tweedie’s formula and selection bias. J. Amer. Statist. Assoc. 106(496), 1602–1614.
- Efron, B. (2019). Bayes, oracle Bayes and empirical Bayes. Statist. Sci. 34(2), 177–201.
- Efron, B. (2022). Exponential families in theory and practice. Cambridge University Press.
- Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist. 19(3), 1257–1272.
- Federer, H. (1959). Curvature measures. Transactions of the American Mathematical Society 93(3), 418–491.
- Feller, W. (1971). An introduction to probability theory and its applications. Vol. II (Second ed.). John Wiley & Sons, Inc., New York-London-Sydney.
- Continuum limit of total variation on point clouds. Archive for Rational Mechanics and Analysis 220(1), 193–241.
- Covariate powered cross-weighted multiple testing. J. R. Stat. Soc. Ser. B. Stat. Methodol. 83(4), 720–751.
- Jiang, W. and C.-H. Zhang (2009). General maximum likelihood empirical Bayes estimation of normal means. Ann. Statist. 37(4), 1647–1684.
- Keener, R. W. (2010). Theoretical statistics. Springer Texts in Statistics. Springer, New York. Topics for a core course.
- Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist. 27, 887–906.
- Laird, N. (1978). Nonparametric maximum likelihood estimation of a mixed distribution. J. Amer. Statist. Assoc. 73(364), 805–811.
- Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. R. Stat. Soc. Ser. B Stat. Methodol. 67(4), 555–572.
- Convex clustering with exemplar-based models. In Advances in neural information processing systems, pp. 825–832.
- Lindsay, B. G. (1983). The geometry of mixture likelihoods: a general theory. Ann. Statist. 11(1), 86–94.
- Lindsay, B. G. (1995). Mixture models: theory, geometry and applications. In NSF-CBMS regional conference series in probability and statistics. JSTOR.
- Finite mixture models. Wiley Series in Probability and Statistics: Applied Probability and Statistics. Wiley-Interscience, New York.
- Meister, A. (2009). Deconvolution problems in nonparametric statistics, Volume 193. Springer Science & Business Media.
- The Bayesian lasso. J. Amer. Statist. Assoc. 103(482), 681–686.
- Multi-marginal optimal transport: Theory and applications. ESAIM: Mathematical Modelling and Numerical Analysis 49(6), 1771–1790.
- Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning 11(5-6), 355–607.
- Robbins, H. (1956a). An empirical Bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954–1955, vol. I, pp. 157–163. University of California Press, Berkeley and Los Angeles.
- Robbins, H. (1956b). An empirical Bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954–1955, vol. I, pp. 157–163. University of California Press, Berkeley-Los Angeles, Calif.
- Santambrogio, F. (2015). Optimal transport for applied mathematicians, Volume 87 of Progress in Nonlinear Differential Equations and their Applications. Birkhäuser/Springer, Cham. Calculus of variations, PDEs, and modeling.
- Permuted and unlinked monotone regression in ℝdsuperscriptℝ𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT: an approach based on mixture modeling and optimal transport. arXiv preprint arXiv:2201.03528.
- Multivariate, heteroscedastic empirical bayes via nonparametric maximum likelihood. arXiv preprint arXiv:2109.03466.
- Stephens, M. (2017). False discovery rates: a new deal. Biostatistics 18(2), 275–294.
- Villani, C. (2003). Topics in optimal transportation, Volume 58 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI.
- Villani, C. (2009). Optimal transport: old and new, Volume 338 of Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin.
- Wainwright, M. J. and M. I. Jordan (2008). Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning 1(1–2), 1–305.
- West, M. (1984). Outlier models and prior distributions in Bayesian linear regression. J. Roy. Statist. Soc. Ser. B 46(3), 431–439.
- A penalized maximum likelihood estimate of f(0+)𝑓limit-from0f(0+)italic_f ( 0 + ) when f𝑓fitalic_f is nonincreasing. Statist. Sinica 3(2), 501–515.
- Zhang, C.-H. (1990). Fourier methods for estimating mixing densities and distributions. Ann. Statist. 18(2), 806–831.
- On efficient and scalable computation of the nonparametric maximum likelihood estimator in mixture models. arXiv preprint arXiv:2208.07514.