Markov Chain Mirror Descent On Data Federation (2309.14775v1)
Abstract: Stochastic optimization methods such as mirror descent have wide applications due to low computational cost. Those methods have been well studied under assumption of the independent and identical distribution, and usually achieve sublinear rate of convergence. However, this assumption may be too strong and unpractical in real application scenarios. Recent researches investigate stochastic gradient descent when instances are sampled from a Markov chain. Unfortunately, few results are known for stochastic mirror descent. In the paper, we propose a new version of stochastic mirror descent termed by MarchOn in the scenario of the federated learning. Given a distributed network, the model iteratively travels from a node to one of its neighbours randomly. Furthermore, we propose a new framework to analyze MarchOn, which yields best rates of convergence for convex, strongly convex, and non-convex loss. Finally, we conduct empirical studies to evaluate the convergence of MarchOn, and validate theoretical results.
- Big data analytics: Optimization and randomization. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (SIGKDD), page 2327, 2015.
- Sébastien Bubeck et al. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.
- Elad Hazan et al. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
- Shai Shalev-Shwartz et al. Online learning and online convex optimization. Foundations and Trends® in Machine Learning, 4(2):107–194, 2012.
- Understanding Machine Learning - From Theory to Algorithms. Cambridge University Press, 2014.
- The generalization ability of online algorithms for dependent data. IEEE Transactions on Information Theory (TOIT), 59(1):573–587, 2013.
- Discrepancy-based theory and algorithms for forecasting non-stationary time series. Annals of Mathematics and Artificial Intelligence, 88(4):367–399, 2020.
- Ergodic mirror descent. SIAM Journal on Optimization, 22(4):1549–1578, 2012.
- On markov chain gradient descent. Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), 31, 2018a.
- Mathieu Even. Stochastic gradient descent under markov-chain sampling schemes. In Proceedings of the 32nd International Conference on Machine Learning (ICML), pages 1–12, 2023.
- Stability and generalization for markov chain stochastic gradient methods. Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), 35:37735–37748, 2022.
- Constrained stochastic nonconvex optimization with state-dependent markov data. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 35:23256–23270, 2022.
- Adapting to mixing time in stochastic optimization with markovian data. In Proceedings of the International Conference on Machine Learning (ICML), pages 5429–5446, 2022.
- Least squares regression with markovian data: Fundamental limits and algorithms. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), 2020.
- First order methods with markovian noise: from acceleration to variational inequalities. arXiv preprint arXiv:2305.15938, 2023.
- On markov chain gradient descent. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), page 9918–9927, 2018b.
- Thinh T. Doan. Finite-time analysis of markov gradient descent. IEEE Transactions on Automatic Control (TAC), 68(4):2140–2153, 2023.
- Asynchronous decentralized parallel stochastic gradient descent. 2017.
- Distributed asynchronous constrained stochastic optimization. IEEE Journal of Selected Topics in Signal Processing, 5(4):772–790, 2011.
- Hoi-To Wai. On the convergence of consensus algorithms with markovian noise and gradient bias. In Proceedings of the IEEE Conference on Decision and Control (CDC), pages 4897–4902, 2020.
- Random walk gradient descent for decentralized learning on graphs. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 926–931, 2019.
- Walk for learning: A random walk approach for federated learning from heterogeneous data. IEEE Journal on Selected Areas in Communications, 41(4):929–940, 2023.
- On the decentralized stochastic gradient descent with markov chain sampling. IEEE Transactions on Signal Processing (TSP), 71:2895–2909, 2023.
- Walkman: A communication-efficient random-walk algorithm for decentralized optimization. IEEE Transactions on Signal Processing (TSP), 68:2513–2528, 2020.
- Proximal algorithms. Foundations and trends® in Optimization, 1(3):127–239, 2014.
- Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2):1–19, 2019.
- Federated optimization in heterogeneous networks. Proceedings of Machine learning and Systems, 2:429–450, 2020.
- Understand dynamic regret with switching cost for online decision making. ACM Transactions on Intelligent Systems and Technology (TIST), 11(3):1–21, 2020.