Fast and Accurate Estimation of Low-Rank Matrices from Noisy Measurements via Preconditioned Non-Convex Gradient Descent (2305.17224v2)
Abstract: Non-convex gradient descent is a common approach for estimating a low-rank $n\times n$ ground truth matrix from noisy measurements, because it has per-iteration costs as low as $O(n)$ time, and is in theory capable of converging to a minimax optimal estimate. However, the practitioner is often constrained to just tens to hundreds of iterations, and the slow and/or inconsistent convergence of non-convex gradient descent can prevent a high-quality estimate from being obtained. Recently, the technique of preconditioning was shown to be highly effective at accelerating the local convergence of non-convex gradient descent when the measurements are noiseless. In this paper, we describe how preconditioning should be done for noisy measurements to accelerate local convergence to minimax optimality. For the symmetric matrix sensing problem, our proposed preconditioned method is guaranteed to locally converge to minimax error at a linear rate that is immune to ill-conditioning and/or over-parameterization. Using our proposed preconditioned method, we perform a 60 megapixel medical image denoising task, and observe significantly reduced noise levels compared to previous approaches.
- Ultrafast compound doppler imaging: Providing full blood flow characterization. IEEE transactions on ultrasonics, ferroelectrics, and frequency control, 58(1):134–147, 2011.
- Emmanuel J Candes. The restricted isometry property and its implications for compressed sensing. Comptes rendus mathematique, 346(9-10):589–592, 2008.
- Matrix completion with noise. Proceedings of the IEEE, 98(6):925–936, 2010.
- Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Transactions on Information Theory, 57(4):2342–2359, 2011.
- Robust principal component analysis? Journal of the ACM (JACM), 58(3):1–37, 2011.
- Phase retrieval via wirtinger flow: Theory and algorithms. IEEE Transactions on Information Theory, 61(4):1985–2007, 2015.
- Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence. Foundations of Computational Mathematics, 21(6):1505–1593, 2021.
- Nonconvex rectangular matrix completion via gradient descent without ℓ2,∞subscriptℓ2\ell_{2,\infty}roman_ℓ start_POSTSUBSCRIPT 2 , ∞ end_POSTSUBSCRIPT regularization. IEEE Transactions on Information Theory, 66(9):5806–5841, 2020.
- Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees. arXiv preprint arXiv:1509.03025, 2015.
- Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14(5):566–806, 2021a.
- Bridging convex and nonconvex optimization in robust pca: Noise, outliers, and missing data. Annals of statistics, 49(5):2948, 2021b.
- How to induce regularization in generalized linear models: A guide to reparametrizing gradient flow. arXiv preprint arXiv:2308.04921, 2023.
- Spatiotemporal clutter filtering of ultrafast ultrasound data highly increases doppler and fultrasound sensitivity. IEEE transactions on medical imaging, 34(11):2271–2285, 2015.
- Rank overspecified robust matrix recovery: Subgradient method and exact recovery. arXiv preprint arXiv:2109.11154, 2021.
- A validation approach to over-parameterized matrix and image recovery. arXiv preprint arXiv:2209.10675, 2022.
- Low-rank matrix completion using alternating minimization. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 665–674, 2013.
- Algorithmic regularization in model-free overparametrized asymmetric matrix factorization. SIAM Journal on Mathematics of Data Science, 5(3):723–744, 2023.
- Understanding incremental learning of gradient descent: A fine-grained analysis of matrix sensing. In International Conference on Machine Learning, pages 15200–15238. PMLR, 2023.
- Theory of point estimation. Springer Science & Business Media, 2006.
- Rapid, robust, and reliable blind deconvolution via nonconvex optimization. Applied and computational harmonic analysis, 47(3):893–934, 2019.
- Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations. In Conference On Learning Theory, pages 2–47. PMLR, 2018.
- Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval and matrix completion. In International Conference on Machine Learning, pages 3345–3354. PMLR, 2018.
- Implicit regularization of sub-gradient method in robust matrix recovery: Don’t be afraid of outliers. arXiv preprint arXiv:2102.02969, 2021.
- Scaled stochastic gradient descent for low-rank matrix completion. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 2820–2825. IEEE, 2016.
- A riemannian geometry for low-rank matrix completion. arXiv preprint arXiv:1211.1550, 2012.
- Yurii Nesterov. Lectures on convex optimization, volume 137. Springer, 2018.
- Phase retrieval using alternating minimization. Advances in Neural Information Processing Systems, 26, 2013.
- Non-square matrix sensing without spurious local minima via the burer-monteiro approach. In Artificial Intelligence and Statistics, pages 65–74. PMLR, 2017.
- Finding low-rank solutions via nonconvex matrix factorization, efficiently and provably. SIAM Journal on Imaging Sciences, 11(4):2165–2204, 2018.
- Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction. Advances in Neural Information Processing Systems, 34:23831–23843, 2021.
- When are nonconvex problems not scary? arXiv preprint arXiv:1510.06096, 2015.
- Guaranteed matrix completion via non-convex factorization. IEEE Transactions on Information Theory, 62(11):6535–6579, 2016.
- Accelerating ill-conditioned low-rank matrix estimation via scaled gradient descent. arXiv preprint arXiv:2005.08898, 2020.
- Low-rank solutions of linear matrix equations via procrustes flow. In International Conference on Machine Learning, pages 964–973. PMLR, 2016.
- The power of preconditioning in overparameterized low-rank matrix sensing. arXiv preprint arXiv:2302.01186, 2023.
- Fast algorithms for robust pca via gradient descent. Advances in neural information processing systems, 29, 2016.
- Accelerating sgd for highly ill-conditioned huge-scale online matrix completion. arXiv preprint arXiv:2208.11246, 2022.
- Preconditioned gradient descent for overparameterized nonconvex burer-monteiro factorization with global optimality certification. J. Mach. Learn. Res., 24:163–1, 2023.
- Preconditioned gradient descent for over-parameterized nonconvex matrix factorization. Advances in Neural Information Processing Systems, 34:5985–5996, 2021.
- Restricted p𝑝pitalic_p-isometry properties of nonconvex matrix recovery. IEEE Transactions on Information Theory, 59(7):4316–4323, 2013.
- Spurious local minima in power system state estimation. IEEE transactions on control of network systems, 6(3):1086–1096, 2019.
- A nonconvex optimization framework for low rank matrix estimation. Advances in Neural Information Processing Systems, 28, 2015.
- A convergent gradient descent algorithm for rank minimization and semidefinite programming from random linear measurements. arXiv preprint arXiv:1506.06081, 2015.
- On the computational and statistical complexity of over-parameterized matrix sensing. arXiv preprint arXiv:2102.02756, 2021.