Optimal Regularization for a Data Source (2212.13597v4)
Abstract: In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment criteria that enforce data fidelity with a regularizer that promotes desired structural properties in the solution. The choice of a suitable regularizer is typically driven by a combination of prior domain information and computational considerations. Convex regularizers are attractive computationally but they are limited in the types of structure they can promote. On the other hand, nonconvex regularizers are more flexible in the forms of structure they can promote and they have showcased strong empirical performance in some applications, but they come with the computational challenge of solving the associated optimization problems. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what is the optimal regularizer for data drawn from the distribution? What properties of a data source govern whether the optimal regularizer is convex? We address these questions for the class of regularizers specified by functionals that are continuous, positively homogeneous, and positive away from the origin. We say that a regularizer is optimal for a data distribution if the Gibbs density with energy given by the regularizer maximizes the population likelihood (or equivalently, minimizes cross-entropy loss) over all regularizer-induced Gibbs densities. As the regularizers we consider are in one-to-one correspondence with star bodies, we leverage dual Brunn-Minkowski theory to show that a radial function derived from a data distribution is akin to a ``computational sufficient statistic'' as it is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization.
- Learning sparsely used overcomplete dictionaries via alternating minimization. SIAM Journal on Optimization, 26(4):2775–2799, 2016.
- A clustering approach to learn sparsely-used overcomplete dictionaries. IEEE Transactions on Information Theory, 63(1):575–592, 2017.
- K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11):4311–4322, 2006.
- S. Amari and H. Nagaoka. Methods of Information Geometry. Translations of mathematical monographs. American Mathematical Society, 2000.
- Living on the edge: Phase transitions in convex programs with random data. Information and Inference: A Journal of the IMA, 224–294:3(3), 2014.
- Simple, efficient, and neural algorithms for sparse coding. Conference on Learning Theory, 2015.
- Solving inverse problems using data-driven models. Acta Numerica, 28:1–174, 2019.
- Dictionary learning and tensor decomposition via the sum-of-squares method. Proceedings of the Forty-seventh Annual ACM Symposium on Theory of Computing, page 143–151, 2015.
- Local rademacher complexities. Annals of Statistics, 33:1497–1537, 2005.
- Modern regularization methods for inverse problems. Acta Numerica, 27:1–111, 2018.
- Atomic norm denoising with applications to line spectral estimation. IEEE Transactions on Signal Processing, 61(23):5987–5999, 2013.
- Compressed sensing using generative models. International Conference on Machine Learning, 2017.
- Andrea Braides. A handbook of ΓΓ\Gammaroman_Γ-convergence. Handbook of Differential Equations: Stationary Partial Differential Equations, Volume 3, 2007.
- Total generalized variation. SIAM Journal on Imaging Sciences, 3:492 – 526, 2011.
- Sampling from a log-concave distribution with projected langevin monte carlo. Discrete and Computational Geometry, 59:757–783, 2018.
- Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6):717–772, 2009.
- Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics, 59(8):1207–1223, 2006.
- Image recovery via total variation minimization and related problems. Numer. Math., 76:167–188, 1997.
- Computational and statistical tradeoffs via convex relaxation. Proceedings of the National Academy of Sciences, 110(13):E1181–E1190, 2013.
- The convex geometry of linear inverse problems. Foundations of Computational Mathematics, 12:805–849, 2012.
- Alternating minimization for dictionary learning: Local convergence guarantees. Advances in Neural Information Processing Systems (NeurIPS), 2017.
- Uncertainty quantification and weak approximation of an elliptic inverse problem. SIAM Journal on Numerical Analysis, 49(6):2524–2542, 2011.
- An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004.
- David Donoho. For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics, 59(6):797–829, 2006.
- Analysis of langevin monte carlo via convex optimization. Journal of Machine Learning Research, 20:1–46, 2019.
- Log-concave sampling: Metropolis-hastings algorithms are fast. Journal of Machine Learning Research, 20:42, 2019.
- P. P. B. Eggermont. Maximum entropy regularization of fredholm integral equations of the first kind. SIAM Journal on Mathematical Analysis, 24(6):1557–1576, 1993.
- Michael Elad. Sparse and redundant representations: From theory to applications in signal and image processing. Springer, 2010.
- Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96:1348 – 1360, 2001.
- Maryam Fazel. Matrix rank minimization with applications. Ph.D. Thesis, Department of Electrical Engineering, Stanford University, 2002.
- Sparsest solutions of underdetermined linear systems via ℓqsubscriptℓ𝑞\ell_{q}roman_ℓ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT -minimization for 0<q⩽10𝑞10<q\leqslant 10 < italic_q ⩽ 1. Applied and Computational Harmonic Analysis, 26:395–407, 2009.
- Convolutional dictionary learning: A comparative review and new algorithms. IEEE Transactions on Computational Imaging, 4(3):366–381, 2018.
- Richard J. Gardner. Geometric tomography. Cambridge: Cambridge University Press, 2006.
- Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Statistical Science, 13(2):163–185, 1998.
- Consistency of bayesian inference with gaussian process priors in an elliptic inverse problem. Inverse Problems, 36(8):085001, 2020.
- Sample complexity of dictionary learning and other matrix factorizations. IEEE Transactions on Information Theory, 61(6):3469–3486, 2015.
- Adityanand Guntuboyina. Optimal rates of convergence for convex set estimation from support functions. Annals of Statistics, 40(1):385–411, 2012.
- A generative variational model for inverse problems in imaging. SIAM Journal on Mathematics of Data Science, 4(1):306–335, 2022.
- Starshaped sets. Aequationes Mathematicae, 94:1001–1092, 2020.
- Statistical learning with sparsity: The lasso and generalizations. Chapman & Hall/CRC, 2015.
- Tunehisa Hirose. On the convergence theorem for star-shaped sets in EnsuperscriptE𝑛\mathrm{E}^{n}roman_E start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Proc. Japan Acad., 41(3):209–211, 1965.
- Bayesian inverse problems with gaussian priors. The Annals of Statistics, 39(5):2626–2657, 2011.
- Total deep variation: A stable regularizer for inverse problems. arXiv preprint arXiv:2006.08789, 2020.
- Hit-and-run from a corner. SIAM Journal on Computing, 35(4):985–1005, 2006.
- The geometry of logconcave functions and sampling algorithms. Random Structures & Algorithms, 30(3):307–358, 2007.
- Adversarial regularizers in inverse problems. Advances in Neural Information Processing Systems (NeurIPS), 31, 2018.
- Erwin Lutwak. Dual mixed volumes. Pacific Journal of Mathematics, 58(2):531–538, 1975.
- Erwin Lutwak. Centroid bodies and dual mixed volumes. Proceedings of the London Mathematical Society, s3-60(2):365–391, 1990.
- Sparse modeling for image and vision processing. Foundations and Trends in Computer Graphics and Vision, 8(2–3):85–283, 2014.
- A fast approach for overcomplete sparse decomposition based on smoothed ℓ0subscriptℓ0\ell_{0}roman_ℓ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT norm. IEEE Transactions on Signal Processing, 57(1):289 – 301, 2008.
- Consistent inversion of noisy non-abelian x-ray transforms. Communications on Pure and Applied Mathematics, 74(5):1045–1099, 2021.
- Harry Nyquist. Certain topics in telegraph transmission theory. Trans. AIEE., 47(2):617–644, 1928.
- Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607–609, 1996.
- Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision in Research, 37(23):3311–3325, 1997.
- Sharp mse bounds for proximal denoising. Foundations of Computational Mathematics, 16:965–1029, 2016.
- The squared-error of generalized lasso: A precise analysis. 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2013.
- Convolutional dictionary learning via local processing. International Conference on Computer Vision (ICCV), page 5296–5304, 2017.
- Nonconvex regularization for sparse neural networks. Applied and Computational Harmonic Analysis, 61:25 – 56, 2022.
- Mark S. Pinsker. Information and information stability of random variables and processes. Holden-Day, Inc., San Francisco, Calif.-London, Amsterdam, 1964.
- David Pollard. Convergence of stochastic processes. Springer-Verlag, 1984.
- Patrick Rebeschini. Lecture 5: Covering numbers bounds for rademacher complexity. chaining. 2021.
- Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3):471–501, 2010.
- Regularization by denoising: clarifications and new interpretations. IEEE Transactions on Computational Imaging, 5(1):52 – 67, 2019.
- Exponential convergence of langevin distributions and their discrete approximations. Bernoulli, pages 341–363, 1996.
- Geometric convergence and central limit theorems for multidimensional hastings and metropolis algorithms. Biometrika, 83(1):95–110, 1996.
- R. Tyrell Rockafellar. Lagrange multipliers and optimality. SIAM Review, 35(2):183–238, 1993.
- The little engine that could: Regularization by denoising (red). SIAM Journal on Imaging Sciences, 10(4):1804–1844, 2017.
- A.M. Rubinov. Radiant sets and their gauges. In: Quasidifferentiability and Related Topics, 43:235–261, 2000.
- Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992.
- Karin Schnass. On the identifiability of overcomplete dictionaries via the minimisation principle underlying k-svd. Applied and Computational Harmonic Analysis, 37(3):464–491, 2014.
- Karin Schnass. Convergence radius and sample complexity of itkm algorithms for dictionary learning. Applied and Computational Harmonic Analysis, 45(1):22–58, 2016.
- Rolf Schneider. Convex bodies: The brunn–minkowski theory. Cambridge: Cambridge University Press., 2013.
- Ivan Selesnick. Sparse regularization via convex analysis. IEEE Transactions on Signal Processing, 65(17):4481–4494, 2017.
- Linear system identification via atomic norm regularization. Proceedings of the 51st Annual Conference on Decision and Control, 2012.
- Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, USA, 2014.
- Learning semidefinite regularizers. Foundations of Computational Mathematics, 19:375–434, 2019.
- Exact recovery of sparsely-used dictionaries. Conference on Learning Theory, 23(37):1–18, 2012.
- Björn Sprungk. On the local lipschitz stability of bayesian inverse problems. Inverse Problems, 36:055015, 2020.
- Andrew M. Stuart. Inverse problems: a bayesian perspective. Acta Numerica, 19:451–559, 2010.
- Complete dictionary recovery over the sphere i: Overview and the geometric picture. IEEE Transactions on Information Theory, 63(2):853–884, 2016.
- Complete dictionary recovery over the sphere ii: Recovery by riemannian trust-region method. IEEE Transactions on Information Theory, 63(2):885–914, 2016.
- Qiyu Sun. Recovery of sparsest signals via ℓqsuperscriptℓ𝑞\ell^{q}roman_ℓ start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT-minimization. Applied and Computational Harmonic Analysis, 32(3):329–341, 2012.
- Grzegorz Sójka. Metrics in the family of star bodies. Advances in Geometry, 13:117–144, 2013.
- Compressed sensing off the grid. IEEE Transactions on Information Theory, 59(11):7465–7490, 2013.
- Ryan Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58:267–288, 1994.
- Andrey Nikolayevich Tikhonov. On stability of inverse problems. Dokl. Akad. Nauk SSSR, 39(5):176–179, 1943.
- A theory of optimal convex regularization for low-dimensional recovery. arXiv preprint arXiv:2112.03540, 2021.
- The sample complexity of dictionary learning. Journal of Machine Learning Research (JMLR), 12:3259–3281, 2011.
- Santosh Vempala. Geometric random walks: A survey. Combinatorial and Computational Geometry, 52(2):573–612, 2005.
- Plug-and-play priors for model based reconstruction. In 2013 IEEE Global Conference on Signal and Information Processing, pages 945–948. IEEE, 2013.
- Roman Vershynin. High-dimensional probability: An introduction with applications in data science. Cambridge University Press, 2020.
- Cédric Villani. Topics in optimal transport. Providence, RI: American Mathematical Society, 2003.
- Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Annals of Statistics, 42(6):2164 – 2201, 2014.
- Brendt Wohlberg. Efficient algorithms for convolutional sparse representations. IEEE Transactions on Image Processing, 25(1):301–315, 2015.
- Efficient learning with a family of nonconvex regularizers by redistributing nonconvexity. Journal of Machine Learning Research (JMLR), 18:1–52, 2018.
- Cun-Hui Zhang. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38(2):894 – 942, 2010.
- Convergence of gaussian-smoothed optimal transport distance with sub-gamma distributions and dependent samples. Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021.