2000 character limit reached
Fast Decentralized Federated Low Rank Matrix Recovery from Column-wise Linear Projections (2204.08117v6)
Published 18 Apr 2022 in cs.IT and math.IT
Abstract: This work develops a provably accurate fully-decentralized alternating projected gradient descent (GD) algorithm for recovering a low rank (LR) matrix from mutually independent projections of each of its columns, in a fast and communication-efficient fashion. To our best knowledge, this work is the first attempt to develop a provably correct decentralized algorithm (i) for any problem involving the use of an alternating projected GD algorithm; (ii) and for any problem in which the constraint set to be projected to is a non-convex set.
- S. Moothedath and N. Vaswani, “Fully decentralized and federated low rank compressive sensing,” 2021.
- ——, “Dec-AltProjGD: Fully-decentralized alternating projected gradient descent for low rank column-wise compressive sensing,” Conference on Decision and Control (CDC), 2022.
- ——, “Comparing decentralized gradient descent approaches and guarantees,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5.
- S. Nayer and N. Vaswani, “Sample-efficient low rank phase retrieval,” IEEE Trans. Info. Th., Dec. 2021.
- R. S. Srinivasa, K. Lee, M. Junge, and J. Romberg, “Decentralized sketching of low rank matrices,” in Neur. Info. Proc. Sys. (NeurIPS), 2019, pp. 10 101–10 110.
- S. Nayer and N. Vaswani, “Fast and sample-efficient federated low rank matrix recovery from column-wise linear and quadratic projections,” IEEE Trans. Info. Th., Feb. 2023.
- F. P. Anaraki and S. Hughes, “Memory and computation efficient pca via very sparse random projections,” in Intl. Conf. Machine Learning (ICML), 2014, pp. 1341–1349.
- S. Babu, S. G. Lingala, and N. Vaswani, “Fast low rank compressive sensing for accelerated dynamic MRI,” IEEE Trans. Comput. Imaging, 2023.
- S. S. Du, W. Hu, S. M. Kakade, J. D. Lee, and Q. Lei, “Few-shot learning via learning the representation, provably,” in Intnl. Conf. Learning Representations (ICLR), 2021.
- N. Tripuraneni, C. Jin, and M. Jordan, “Provable meta-learning of linear representations,” in International Conference on Machine Learning, 2021, pp. 10 434–10 443.
- J. Hu, X. Chen, C. Jin, L. Li, and L. Wang, “Near-optimal representation learning for linear bandits and linear rl,” in International Conference on Machine Learning, 2021, pp. 4349–4358.
- J. Yang, W. Hu, J. D. Lee, and S. S. Du, “Impact of representation learning in linear bandits,” arXiv:2010.06531, 2020.
- L. Cella, K. Lounici, G. Pacreau, and M. Pontil, “Multi-task representation learning with stochastic linear bandits,” in International Conference on Artificial Intelligence and Statistics, 2023, pp. 4822–4847.
- C. D’Eramo, D. Tateo, A. Bonarini, M. Restelli, and J. Peters, “Sharing knowledge in multi-task deep reinforcement learning,” arXiv:2401.09561.
- S. Arora, S. Du, S. Kakade, Y. Luo, and N. Saunshi, “Provable representation learning for imitation learning via bi-level optimization,” in International Conference on Machine Learning, 2020, pp. 367–376.
- E. J. Candes and B. Recht, “Exact matrix completion via convex optimization,” Found. of Comput. Math, no. 9, pp. 717–772, 2008.
- P. Netrapalli, P. Jain, and S. Sanghavi, “Low-rank matrix completion using alternating minimization,” in Annual ACM Symp. on Th. of Comp. (STOC), 2013.
- A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009.
- S. Lee and A. Nedić, “Distributed random projection algorithm for convex optimization,” IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 2, pp. 221–229, 2013.
- A. Nedić, “Convergence rate of distributed averaging dynamics and optimization in networks,” Foundations and Trends® in Systems and Control, vol. 2, no. 1, pp. 1–100, 2015.
- I. Lobel and A. Ozdaglar, “Distributed subgradient methods for convex optimization over random networks,” IEEE Transactions on Automatic Control, vol. 56, no. 6, pp. 1291–1306, 2010.
- K. Yuan, Q. Ling, and W. Yin, “On the convergence of decentralized gradient descent,” SIAM Journal on Optimization, vol. 26, no. 3, pp. 1835–1854, 2016.
- A. Reisizadeh, A. Mokhtari, H. Hassani, and R. Pedarsani, “An exact quantized decentralized gradient descent algorithm,” IEEE Transactions on Signal Processing, vol. 67, no. 19, pp. 4934–4947, 2019.
- W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact first-order algorithm for decentralized consensus optimization,” SIAM Journal on Optimization, vol. 25, no. 2, pp. 944–966, 2015.
- A. Nedić, A. Ozdaglar, and P. A. Parrilo, “Constrained consensus and optimization in multi-agent networks,” IEEE Transactions on Automatic Control, vol. 55, no. 4, pp. 922–938, 2010.
- F. Shahriari-Mehr, D. Bosch, and A. Panahi, “Decentralized constrained optimization: Double averaging and gradient projection,” arXiv:2106.11408, 2021.
- A. Rogozin and A. Gasnikov, “Projected gradient method for decentralized optimization over time-varying networks,” arXiv:1911.08527, 2019.
- P. Bianchi and J. Jakubowicz, “Convergence of a multi-agent projected stochastic gradient algorithm for non-convex optimization,” IEEE transactions on automatic control, vol. 58, no. 2, pp. 391–405, 2012.
- P. Bianchi, G. Fort, and W. Hachem, “Performance of a distributed stochastic approximation algorithm,” IEEE Transactions on Information Theory, vol. 59, no. 11, pp. 7405–7418, 2013.
- J. Lafond, H.-T. Wai, and E. Moulines, “D-fw: Communication efficient distributed algorithms for high-dimensional sparse optimization,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4144–4148.
- P. Di Lorenzo and G. Scutari, “Next: In-network nonconvex optimization,” IEEE Transactions on Signal and Information Processing over Networks, vol. 2, no. 2, pp. 120–136, 2016.
- T. Tatarenko and B. Touri, “Non-convex distributed optimization,” IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3744–3757, 2017.
- J. Zeng and W. Yin, “On nonconvex decentralized gradient descent,” IEEE Transactions on signal processing, vol. 66, no. 11, pp. 2834–2848, 2018.
- H.-T. Wai, J. Lafond, A. Scaglione, and E. Moulines, “Decentralized frank–wolfe algorithm for convex and nonconvex problems,” IEEE Transactions on Automatic Control, vol. 62, no. 11, pp. 5522–5537, 2017.
- D. Kempe and F. McSherry, “A decentralized algorithm for spectral analysis,” Journal of Computer and System Sciences, vol. 74, no. 1, pp. 70–83, 2008.
- A. Gang, B. Xiang, and W. U. Bajwa, “Distributed principal subspace analysis for partitioned big data: Algorithms, analysis, and implementation,” IEEE Transactions on Signal and Information Processing over Networks, vol. 7, pp. 699–715, 2021.
- H. Ye and T. Zhang, “DeEPCA: Decentralized exact PCA with linear convergence rate.” J. Mach. Learn. Res., vol. 22, no. 238, pp. 1–27, 2021.
- S. Nayer, P. Narayanamurthy, and N. Vaswani, “Provable low rank phase retrieval,” IEEE Trans. Info. Th., March 2020.
- X. Li and A. Scaglione, “Convergence and applications of a gossip-based gauss-newton algorithm,” IEEE transactions on signal processing, vol. 61, no. 21, pp. 5231–5246, 2013.
- A. Olshevsky and J. N. Tsitsiklis, “Convergence speed in distributed consensus and averaging,” SIAM journal on control and optimization, vol. 48, no. 1, pp. 33–55, 2009.
- X. Yi, D. Park, Y. Chen, and C. Caramanis, “Fast algorithms for robust pca via gradient descent,” in Neur. Info. Proc. Sys. (NeurIPS), 2016.
- P. Jain and P. Netrapalli, “Fast exact matrix completion with finite samples,” in Conf. on Learning Theory, 2015, pp. 1007–1034.
- S. Boyd, P. Diaconis, and L. Xiao, “Fastest mixing markov chain on a graph,” SIAM review, vol. 46, no. 4, pp. 667–689, 2004.
- L. Lovász, “Random walks on graphs,” Combinatorics, Paul erdos is eighty, vol. 2, no. 1-46, p. 4, 1993.
- Y. Chen and E. Candes, “Solving random quadratic systems of equations is nearly as easy as solving linear systems,” in Neur. Info. Proc. Sys. (NeurIPS), 2015, pp. 739–747.
- M. Hardt and E. Price, “The noisy power method: A meta algorithm with applications,” Advances in neural information processing systems (NeurIPS), 2014.
- A. Olshevsky, “Linear time average consensus on fixed graphs and implications for decentralized optimization and multi-agent control,” arXiv:1411.4186, 2014.
- Y. Cherapanamjeri, K. Gupta, and P. Jain, “Nearly-optimal robust matrix completion,” ICML, 2016.
- S. Moothedath and N. Vaswani, “Fast, communication-efficient, and provable decentralized low rank matrix recovery,” Arxiv: 2204.08117, 2023.
- M. Rudelson and R. Vershynin, “Smallest singular value of a random rectangular matrix,” Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, vol. 62, no. 12, pp. 1707–1739, 2009.
- Q. Ling, Y. Xu, W. Yin, and Z. Wen, “Decentralized low-rank matrix completion,” in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 2925–2928.
- A.-Y. Lin and Q. Ling, “Decentralized and privacy-preserving low-rank matrix completion,” Journal of the Operations Research Society of China, vol. 3, no. 2, pp. 189–205, 2015.
- M. Mardani, G. Mateos, and G. Giannakis, “Decentralized sparsity-regularized rank minimization: Algorithms and applications,” IEEE Trans. Sig. Proc., 2013.
- S. Li, Q. Li, Z. Zhu, G. Tang, and M. B. Wakin, “The global geometry of centralized and distributed low-rank matrix recovery without regularization,” IEEE Signal Processing Letters, vol. 27, pp. 1400–1404, 2020.
- R. Xin, U. A. Khan, and S. Kar, “A fast randomized incremental gradient method for decentralized non-convex optimization,” IEEE Transactions on Automatic Control, 2021.
- G. Stewart, “Perturbation bounds for the QR factorization of a matrix,” SIAM Journal on Numerical Analysis, vol. 14, no. 3, pp. 509–518, 1977.
- J.-g. Sun, “On perturbation bounds for the qr factorization,” Linear algebra and its applications, vol. 215, pp. 95–111, 1995.
- X. Yi, D. Park, Y. Chen, and C. Caramanis, “Fast algorithms for robust pca via gradient descent,” in Advances in neural information processing systems, 2016, pp. 4152–4160.
- Y. Chen, Y. Chi, J. Fan, and C. Ma, “Spectral methods for data science: A statistical perspective,” arXiv preprint arXiv:2012.08496, 2020.