Randomized Approach to Matrix Completion: Applications in Recommendation Systems and Image Inpainting (2403.01919v4)
Abstract: We present a novel method for matrix completion, specifically designed for matrices where one dimension significantly exceeds the other. Our Columns Selected Matrix Completion (CSMC) method combines Column Subset Selection and Low-Rank Matrix Completion to efficiently reconstruct incomplete datasets. In each step, CSMC solves a convex optimization task. We introduce two algorithms that implement CSMC, each tailored to different problem sizes. A formal analysis outlines the necessary assumptions and the probability of a correct solution. To assess the impact of matrix size, rank, and the proportion of missing entries on solution quality and computation time, we conducted experiments on synthetic data. The method was applied to two real-world problems: recommendation systems and image inpainting. Our results show that CSMC delivers solutions comparable to state-of-the-art matrix completion algorithms based on convex optimization, but with significant runtime savings. This makes CSMC especially valuable for systems that require efficient processing of large, incomplete datasets while maintaining the integrity of the derived insights.
- Structured matrix completion with applications to genomic data integration. Journal of the American Statistical Association, 111, 04 2015.
- Video deraining and desnowing using temporal correlation and low-rank matrix completion. IEEE Transactions on Image Processing, 24(9):2658–2670, 2015.
- Matrix completion with cross-concentrated sampling: Bridging uniform sampling and cur sampling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Light field inpainting propagation via low rank matrix completion. IEEE Transactions on Image Processing, 27(4):1981–1993, 2018.
- Ankur Moitra. Algorithmic Aspects of Machine Learning. Cambridge University Press, USA, 1st edition, 2018.
- Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3):471–501, 2010.
- Benjamin Recht. A simpler approach to matrix completion. J. Mach. Learn. Res., 12(null):3413–3430, dec 2011.
- David Gross. Recovering low-rank matrices from few coefficients in any basis. IEEE Transactions on Information Theory, 57(3):1548–1566, Mar 2011.
- The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5):2053–2080, 2010.
- Harnessing structures in big data via guaranteed low-rank matrix estimation: Recent theory and fast algorithms via convex and nonconvex optimization. IEEE Signal Processing Magazine, 35(4):14–31, 2018.
- David P. Woodruff. Sketching as a tool for numerical linear algebra. Found. Trends Theor. Comput. Sci., 10(1–2):1–157, oct 2014.
- Computational methods for sparse solution of linear inverse problems. Proceedings of the IEEE, 98(6):948–958, 2010.
- Faster least squares approximation. Numerische Mathematik, 117, 10 2007.
- Low-rank matrix completion using alternating minimization. In Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, STOC ’13, page 665–674, New York, NY, USA, 2013. Association for Computing Machinery.
- Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 11(80):2287–2322, 2010.
- A singular value thresholding algorithm for matrix completion. SIAM Journal on optimization, 20(4):1956–1982, 2010.
- Exact matrix completion via convex optimization. Commun. ACM, 55(6):111–119, jun 2012.
- Moritz Hardt. Understanding alternating minimization for matrix completion. 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 651–660, 2014.
- Bart Vandereycken. Low-rank matrix completion by riemannian optimization. SIAM Journal on Optimization, 23(2):1214–1236, 2013.
- Scaled gradients on grassmann manifolds for matrix completion. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.
- Robust low-rank matrix completion by riemannian optimization. SIAM Journal on Scientific Computing, 38(5):S440–S460, 2016.
- A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20:1956–1982, 03 2010.
- Martin Jaggi. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 427–435, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR.
- An extended frank–wolfe method with “in-face” directions, and its application to low-rank matrix completion. SIAM Journal on Optimization, 27, 11 2015.
- Sketchy decisions: Convex low-rank matrix optimization with optimal storage. In Artificial intelligence and statistics, pages 1188–1196. PMLR, 2017.
- Local minima and convergence in low-rank semidefinite programming. Mathematical Programming, 103:427–444, 07 2005.
- Matrix approximation and projective clustering via volume sampling. Theory of Computing, 2:225–247, 01 2006.
- Relative-error cur𝑐𝑢𝑟curitalic_c italic_u italic_r matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30:844–881, 05 2008.
- A deim induced cur factorization. SIAM Journal on Scientific Computing, 38(3):A1454–A1482, 2016.
- Perspectives on cur decompositions. Applied and Computational Harmonic Analysis, 48(3):1088–1099, 2020.
- Efficient algorithms for cur and interpolative matrix decompositions. Advances in Computational Mathematics, 43:495–516, 2017.
- Optimal cur matrix decompositions. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 353–362, 2014.
- The pseudoinverse of A=CR𝐴𝐶𝑅A=CRitalic_A = italic_C italic_R is A+=R++C+superscript𝐴superscript𝑅superscript𝐶A^{+}=R^{+}+C^{+}italic_A start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = italic_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT + italic_C start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT(?). arXiv preprint arXiv:2305.01716, 2023.
- Provably correct algorithms for matrix column subset selection with selectively sampled data. J. Mach. Learn. Res., 18(1):5699–5740, jan 2017.
- MC2: a two-phase algorithm for leveraged matrix completion. Information and Inference: A Journal of the IMA, 7(3):581–604, 02 2018.
- On the power of adaptivity in matrix completion and approximation. CoRR, abs/1407.3619, 2014.
- Low-rank matrix and tensor completion via adaptive sampling. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013.
- Cur algorithm for partially observed matrices. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, page 1412–1421. JMLR.org, 2015.
- Robust cur decomposition: Theory and imaging applications. SIAM J. Imaging Sci., 14:1472–1503, 2021.
- Brendan O’Donoghue. Operator splitting for a homogeneous embedding of the linear complementarity problem. SIAM Journal on Optimization, 31:1999–2023, August 2021.
- Array programming with NumPy. Nature, 585(7825):357–362, September 2020.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
- Globally convergent type–I Anderson acceleration for non-smooth fixed-point iterations. SIAM Journal on Optimization, 30(4):3170–3197, 2020.
- fancyimpute: An imputation library for python.
- A. W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998.
- Paweł Szynkiewicz. A comparative study of pso and cma-es algorithms on black-box optimization benchmarks. Journal of Telecommunications and Information Technology, 8, 01 2019.
- Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion and blind deconvolution. Foundations of Computational Mathematics, 20, 11 2017.
- Missing value estimation methods for dna microarrays. Bioinformatics, 17 6:520–5, 2001.
- The movielens datasets: History and context. 5(4), dec 2015.
- A random walk method for alleviating the sparsity problem in collaborative filtering. In Proceedings of the 2008 ACM Conference on Recommender Systems, RecSys ’08, page 131–138, New York, NY, USA, 2008. Association for Computing Machinery.
- Convex optimization. Cambridge university press, 2004.
- Joel Tropp. Improved analysis of the subsamples randomized hadamard transform. Advances in Adaptive Data Analysis, 03, 11 2010.