Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A New Perspective On Denoising Based On Optimal Transport (2312.08135v2)

Published 13 Dec 2023 in math.ST, cs.LG, math.OC, stat.ML, and stat.TH

Abstract: In the standard formulation of the denoising problem, one is given a probabilistic model relating a latent variable $\Theta \in \Omega \subset \mathbb{R}m \; (m\ge 1)$ and an observation $Z \in \mathbb{R}d$ according to: $Z \mid \Theta \sim p(\cdot\mid \Theta)$ and $\Theta \sim G*$, and the goal is to construct a map to recover the latent variable from the observation. The posterior mean, a natural candidate for estimating $\Theta$ from $Z$, attains the minimum Bayes risk (under the squared error loss) but at the expense of over-shrinking the $Z$, and in general may fail to capture the geometric features of the prior distribution $G*$ (e.g., low dimensionality, discreteness, sparsity, etc.). To rectify these drawbacks, we take a new perspective on this denoising problem that is inspired by optimal transport (OT) theory and use it to study a different, OT-based, denoiser at the population level setting. We rigorously prove that, under general assumptions on the model, this OT-based denoiser is mathematically well-defined and unique, and is closely connected to the solution to a Monge OT problem. We then prove that, under appropriate identifiability assumptions on the model, the OT-based denoiser can be recovered solely from information of the marginal distribution of $Z$ and the posterior mean of the model, after solving a linear relaxation problem over a suitable space of couplings that is reminiscent of standard multimarginal OT problems. In particular, thanks to Tweedie's formula, when the likelihood model ${ p(\cdot \mid \theta) }_{\theta \in \Omega}$ is an exponential family of distributions, the OT based-denoiser can be recovered solely from the marginal distribution of $Z$. In general, our family of OT-like relaxations is of interest in its own right and for the denoising problem suggests alternative numerical methods inspired by the rich literature on computational OT.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Barycenters in the wasserstein space. SIAM Journal on Mathematical Analysis 43(2), 904–924.
  2. Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media.
  3. Andrews, D. F. and C. L. Mallows (1974). Scale mixtures of normal distributions. J. Roy. Statist. Soc. Ser. B 36, 99–102.
  4. Latent variable models and factor analysis (Third ed.). Wiley Series in Probability and Statistics. John Wiley & Sons, Ltd., Chichester. A unified approach.
  5. Böhning, D. (1999). Computer-assisted analysis of mixtures and applications, Volume 81 of Monographs on Statistics and Applied Probability. Chapman & Hall/CRC, Boca Raton, FL. Meta-analysis, disease mapping and others.
  6. Optimal rates of convergence for deconvolving a density. J. Amer. Statist. Assoc. 83(404), 1184–1186.
  7. Rates of estimation of optimal transport maps using plug-in estimators via barycentric projections. Advances in Neural Information Processing Systems 34, 29736–29753.
  8. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39(1), 1–38. With discussion.
  9. Dyson, F. (1926). A method for correcting series of parallax observations. Monthly Notices of the Royal Astronomical Society 86, 686.
  10. Efron, B. (2011a). Tweedie’s formula and selection bias. J. Amer. Statist. Assoc. 106(496), 1602–1614.
  11. Efron, B. (2011b). Tweedie’s formula and selection bias. J. Amer. Statist. Assoc. 106(496), 1602–1614.
  12. Efron, B. (2019). Bayes, oracle Bayes and empirical Bayes. Statist. Sci. 34(2), 177–201.
  13. Efron, B. (2022). Exponential families in theory and practice. Cambridge University Press.
  14. Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist. 19(3), 1257–1272.
  15. Federer, H. (1959). Curvature measures. Transactions of the American Mathematical Society 93(3), 418–491.
  16. Feller, W. (1971). An introduction to probability theory and its applications. Vol. II (Second ed.). John Wiley & Sons, Inc., New York-London-Sydney.
  17. Continuum limit of total variation on point clouds. Archive for Rational Mechanics and Analysis 220(1), 193–241.
  18. Covariate powered cross-weighted multiple testing. J. R. Stat. Soc. Ser. B. Stat. Methodol. 83(4), 720–751.
  19. Jiang, W. and C.-H. Zhang (2009). General maximum likelihood empirical Bayes estimation of normal means. Ann. Statist. 37(4), 1647–1684.
  20. Keener, R. W. (2010). Theoretical statistics. Springer Texts in Statistics. Springer, New York. Topics for a core course.
  21. Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist. 27, 887–906.
  22. Laird, N. (1978). Nonparametric maximum likelihood estimation of a mixed distribution. J. Amer. Statist. Assoc. 73(364), 805–811.
  23. Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. R. Stat. Soc. Ser. B Stat. Methodol. 67(4), 555–572.
  24. Convex clustering with exemplar-based models. In Advances in neural information processing systems, pp. 825–832.
  25. Lindsay, B. G. (1983). The geometry of mixture likelihoods: a general theory. Ann. Statist. 11(1), 86–94.
  26. Lindsay, B. G. (1995). Mixture models: theory, geometry and applications. In NSF-CBMS regional conference series in probability and statistics. JSTOR.
  27. Finite mixture models. Wiley Series in Probability and Statistics: Applied Probability and Statistics. Wiley-Interscience, New York.
  28. Meister, A. (2009). Deconvolution problems in nonparametric statistics, Volume 193. Springer Science & Business Media.
  29. The Bayesian lasso. J. Amer. Statist. Assoc. 103(482), 681–686.
  30. Multi-marginal optimal transport: Theory and applications. ESAIM: Mathematical Modelling and Numerical Analysis 49(6), 1771–1790.
  31. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning 11(5-6), 355–607.
  32. Robbins, H. (1956a). An empirical Bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954–1955, vol. I, pp. 157–163. University of California Press, Berkeley and Los Angeles.
  33. Robbins, H. (1956b). An empirical Bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954–1955, vol. I, pp. 157–163. University of California Press, Berkeley-Los Angeles, Calif.
  34. Santambrogio, F. (2015). Optimal transport for applied mathematicians, Volume 87 of Progress in Nonlinear Differential Equations and their Applications. Birkhäuser/Springer, Cham. Calculus of variations, PDEs, and modeling.
  35. Permuted and unlinked monotone regression in ℝdsuperscriptℝ𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT: an approach based on mixture modeling and optimal transport. arXiv preprint arXiv:2201.03528.
  36. Multivariate, heteroscedastic empirical bayes via nonparametric maximum likelihood. arXiv preprint arXiv:2109.03466.
  37. Stephens, M. (2017). False discovery rates: a new deal. Biostatistics 18(2), 275–294.
  38. Villani, C. (2003). Topics in optimal transportation, Volume 58 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI.
  39. Villani, C. (2009). Optimal transport: old and new, Volume 338 of Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin.
  40. Wainwright, M. J. and M. I. Jordan (2008). Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning 1(1–2), 1–305.
  41. West, M. (1984). Outlier models and prior distributions in Bayesian linear regression. J. Roy. Statist. Soc. Ser. B 46(3), 431–439.
  42. A penalized maximum likelihood estimate of f⁢(0+)𝑓limit-from0f(0+)italic_f ( 0 + ) when f𝑓fitalic_f is nonincreasing. Statist. Sinica 3(2), 501–515.
  43. Zhang, C.-H. (1990). Fourier methods for estimating mixing densities and distributions. Ann. Statist. 18(2), 806–831.
  44. On efficient and scalable computation of the nonparametric maximum likelihood estimator in mixture models. arXiv preprint arXiv:2208.07514.

Summary

We haven't generated a summary for this paper yet.