The radius of statistical efficiency (2405.09676v1)
Abstract: Classical results in asymptotic statistics show that the Fisher information matrix controls the difficulty of estimating a statistical model from observed data. In this work, we introduce a companion measure of robustness of an estimation problem: the radius of statistical efficiency (RSE) is the size of the smallest perturbation to the problem data that renders the Fisher information matrix singular. We compute RSE up to numerical constants for a variety of test bed problems, including principal component analysis, generalized linear models, phase retrieval, bilinear sensing, and matrix completion. In all cases, the RSE quantifies the compatibility between the covariance of the population data and the latent model parameter. Interestingly, we observe a precise reciprocal relationship between RSE and the intrinsic complexity/sensitivity of the problem instance, paralleling the classical Eckart-Young theorem in numerical analysis.
- Linear-time estimators for propensity scores. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 93–100. JMLR Workshop and Conference Proceedings, 2011.
- D. Azé and J.-N. Corvellec. Characterizations of error bounds for lower semicontinuous functions on metric spaces. ESAIM: Control, Optimisation and Calculus of Variations, 10(3):409–425, 2004.
- R. Balan. Reconstruction of signals from magnitudes of redundant representations: The complex case. Foundations of Computational Mathematics, 16:677–721, 2016.
- Saving phase: Injectivity and stability for phase retrieval. Applied and Computational Harmonic Analysis, 37(1):106–125, 2014.
- When can you trust feature selection?–i: A condition-based analysis of lasso and generalised hardness of approximation. arXiv preprint arXiv:2312.11425, 2023.
- A grassmann manifold handbook: Basic geometry and computational aspects. Advances in Computational Mathematics, 50(1):1–51, 2024.
- On the bures–wasserstein distance between positive definite matrices. Expositiones Mathematicae, 37(2):165–191, 2019.
- T. Blumensath and M. E. Davies. Iterative hard thresholding for compressed sensing. Applied and computational harmonic analysis, 27(3):265–274, 2009.
- J. Borwein and A. Lewis. Convex Analysis. Springer, 2006.
- N. Boumal. An introduction to optimization on smooth manifolds. Cambridge University Press, 2023.
- H. Brezis. Monotonic maximal operators and semi-groups of contractions in Hilbert spaces. Elsevier, 1973.
- P. Bürgisser and F. Cucker. On a problem posed by steve smale. Annals of Mathematics, pages 1785–1836, 2011.
- P. Bürgisser and F. Cucker. Condition: The geometry of numerical algorithms, volume 349. Springer Science & Business Media, 2013.
- Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence. Foundations of Computational Mathematics, 21(6):1505–1593, 2021a.
- Composite optimization for robust rank one bilinear sensing. Information and Inference: A Journal of the IMA, 10(2):333–396, 2021b.
- Y. Chen and M. J. Wainwright. Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees. arXiv preprint arXiv:1509.03025, 2015.
- Nonconvex optimization meets low-rank matrix factorization: An overview. IEEE Transactions on Signal Processing, 67(20):5239–5269, 2019.
- Compressed sensing and best k-term approximation. Journal of the American mathematical society, 22(1):211–231, 2009.
- H. Cramér. Mathematical methods of statistics, volume 26. Princeton university press, 1999.
- Y. Cui and J.-S. Pang. Modern nonconvex nondifferentiable optimization. SIAM, 2021.
- Stochastic approximation with decision-dependent distributions: asymptotic normality and optimality. Journal of Machine Learning Research, 25(90):1–49, 2024.
- Prox-regularity of spectral functions and spectral sets. Journal of Convex Analysis, 15(3):547–560, 2008.
- Orthogonal invariance and identifiability. SIAM Journal on Matrix Analysis and Applications, 35(2):580–598, 2014.
- M. A. Davenport and J. Romberg. An overview of low-rank matrix recovery from incomplete observations. IEEE Journal of Selected Topics in Signal Processing, 10(4):608–622, 2016.
- C. Davis. All convex invariant functions of hermitian matrices. Archiv der Mathematik, 8(4):276–278, 1957.
- Asymptotic normality and optimality in nonsmooth stochastic approximation. arXiv preprint arXiv:2301.06632, 2023.
- E. De Giorgi. New problems on minimizing movements. Ennio de Giorgi: Selected Papers, pages 699–713, 1993.
- J. W. Demmel. On condition numbers and the distance to the nearest ill-posed problem. Numerische Mathematik, 51:251–289, 1987.
- J. W. Demmel. Applied numerical linear algebra. SIAM, 1997.
- M. Díaz. The nonsmooth landscape of blind deconvolution. Workshop on Optimization for Machine Learning, 2019.
- The radius of metric regularity. Transactions of the American Mathematical Society, 355(2):493–517, 2003.
- Implicit functions and solution mappings, volume 543. Springer, 2009.
- D. Drusvyatskiy and C. Paquette. Variational analysis of spectral functions simplified. Journal of Convex Analysis, 25(1):119–134, 2018.
- Curves of descent. SIAM Journal on Control and Optimization, 53(1):114–138, 2015.
- J. C. Duchi and F. Ruan. Asymptotic optimality in stochastic optimization. The Annals of Statistics, 49(1):21–48, 2021.
- Some characterizations and properties of the “distance to ill-posedness” and the condition measure of a conic linear system. Mathematical Programming, 86(2):225–260, 1999.
- A. D. Ioffe. Metric regularity and subdifferential calculus. Russian Mathematical Surveys, 55(3):501, 2000.
- A. D. Ioffe. Variational analysis of regular mappings. Springer Monographs in Mathematics. Springer, Cham, 2017.
- Asymptotics in statistics: some basic concepts. Springer Science & Business Media, 2000.
- A. S. Lewis. Convex analysis on the hermitian matrices. SIAM Journal on Optimization, 6(1):164–177, 1996.
- A. S. Lewis. Nonsmooth analysis of eigenvalues. Mathematical Programming, 84(1):1–24, 1999.
- Nonsmooth analysis of singular values. part i: Theory. Set-Valued Analysis, 13:213–241, 2005.
- W. V. Li and A. Wei. Gaussian integrals involving absolute value functions. In High dimensional probability V: the Luminy volume, volume 5, pages 43–60. Institute of Mathematical Statistics, 2009.
- X. Liu and N. D. Sidiropoulos. Cramér-rao lower bounds for low-rank decomposition of multidimensional arrays. IEEE Transactions on Signal Processing, 49(9):2074–2086, 2001.
- Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval and matrix completion. In International Conference on Machine Learning, pages 3345–3354. PMLR, 2018.
- P. McCullagh. Quasi-likelihood functions. The Annals of Statistics, 11(1), Mar. 1983.
- J.-J. Moreau. Proximité and dualityé in a hilbertian space. Bulletin of the Mathematical Society of France, 93:273–299, 1965.
- On the achievability of cramér–rao bound in noisy compressed sensing. IEEE Transactions on Signal Processing, 60(1):518–526, 2011.
- J. Pena. Understanding the geometry of infeasible perturbations of a conic linear system. SIAM Journal on Optimization, 10(2):534–550, 2000.
- R. Poliquin and R. T. Rockafellar. Tilt stability of a local minimum. SIAM Journal on Optimization, 8(2):287–299, 1998.
- Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838–855, 1992. ISSN 0363-0129. doi: 10.1137/0330046.
- Dataset Shift in Machine Learning. The MIT Press, 2009. ISBN 0262170051.
- C. R. Rao. Information and the accuracy attainable in the estimation of statistical parameters. In Breakthroughs in Statistics: Foundations and basic theory, pages 235–247. Springer, 1992.
- Doubly robust covariate shift correction. In Proceedings of the AAAI conference on artificial intelligence, volume 29, 2015.
- J. Renegar. Some perturbation theory for linear programming. Mathematical programming, 65(1):73–91, 1994.
- J. Renegar. Incorporating condition measures into the complexity theory of linear programming. SIAM Journal on Optimization, 5(3):506–524, 1995a.
- J. Renegar. Linear programming, complexity theory and elementary functional analysis. Mathematical Programming, 70(1):279–351, 1995b.
- Computational complexity versus statistical performance on sparse recovery problems. Information and Inference: A Journal of the IMA, 9(1):1–32, 2020.
- F. Santambrogio. Optimal transport for applied mathematicians. Birkäuser, NY, 55(58-63):94, 2015.
- H. S. Sendov. The higher-order derivatives of spectral functions. Linear algebra and its applications, 424(1):240–281, 2007.
- H. Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference, 90(2):227–244, 2000.
- S. T. Smith. Covariance, subspace, and intrinsic crame/spl acute/r-rao bounds. IEEE Transactions on Signal Processing, 53(5):1610–1630, 2005.
- J. Sylvester. On the differentiability of 𝐎(n)𝐎𝑛\mathbf{O}(n)bold_O ( italic_n ) invariant functions of symmetric matrices. Duke Mathematical Journal, 52(2):475 – 483, 1985. doi: 10.1215/S0012-7094-85-05223-8. URL https://doi.org/10.1215/S0012-7094-85-05223-8.
- C. Theobald. An inequality for the trace of the product of two symmetric matrices. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 77, pages 265–267. Cambridge University Press, 1975.
- A. W. Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
- C. Villani. Optimal Transport. Springer Berlin Heidelberg, 2009.
- J. Von Neumann. Some matrix inequalities and metrization of matrixspace, tomsk univ. rev.(1937), 286–300. see also: Collected works vol. iv, 1962.
- M. J. Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
- Robust learning under uncertain test distributions: Relating covariate shift to model misspecification. In International Conference on Machine Learning, pages 631–639. PMLR, 2014.
- J. Wright and Y. Ma. High-dimensional data analysis with low-dimensional models: Principles, computation, and applications. Cambridge University Press, 2022.
- A new complexity metric for nonconvex rank-one generalized matrix completion. Mathematical Programming, pages 1–42, 2023.
- M. Šilhavý. Differentiability properties of isotropic functions. Duke Mathematical Journal, 104(3):367 – 373, 2000. doi: 10.1215/S0012-7094-00-10431-0. URL https://doi.org/10.1215/S0012-7094-00-10431-0.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.