Federated Online and Bandit Convex Optimization (2311.17586v1)
Abstract: We study the problems of distributed online and bandit convex optimization against an adaptive adversary. We aim to minimize the average regret on $M$ machines working in parallel over $T$ rounds with $R$ intermittent communications. Assuming the underlying cost functions are convex and can be generated adaptively, our results show that collaboration is not beneficial when the machines have access to the first-order gradient information at the queried points. This is in contrast to the case for stochastic functions, where each machine samples the cost functions from a fixed distribution. Furthermore, we delve into the more challenging setting of federated online optimization with bandit (zeroth-order) feedback, where the machines can only access values of the cost functions at the queried points. The key finding here is identifying the high-dimensional regime where collaboration is beneficial and may even lead to a linear speedup in the number of machines. We further illustrate our findings through federated adversarial linear bandits by developing novel distributed single and two-point feedback algorithms. Our work is the first attempt towards a systematic understanding of federated online optimization with limited feedback, and it attains tight regret bounds in the intermittent communication setting for both first and zeroth-order feedback. Our results thus bridge the gap between stochastic and adaptive settings in federated online optimization.
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1–122, 2012.
- Sébastien Bubeck et al. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.
- A stochastic newton algorithm for distributed convex optimization. Advances in Neural Information Processing Systems, 34, 2021.
- Federated learning of out-of-vocabulary words. arXiv preprint arXiv:1903.10635, 2019.
- Better mini-batch algorithms via accelerated gradient methods. Advances in neural information processing systems, 24, 2011.
- Addressing modern and practical challenges in machine learning: A survey of online federated and transfer learning. arXiv preprint arXiv:2202.03070, 2022.
- Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research, 13(1), 2012.
- Communication trade-offs for local-sgd with large step size. Advances in Neural Information Processing Systems, 32, 2019.
- Differentially-private federated linear bandits. Advances in Neural Information Processing Systems, 33:6003–6014, 2020.
- Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61(5):2788–2806, 2015.
- Federated learning in vehicular networks. arXiv preprint arXiv:2006.01412, 2020.
- Online convex optimization in the bandit setting: gradient descent without a gradient. arXiv preprint cs/0408007, 2004.
- Resource-aware asynchronous online federated learning for nonlinear regression. In ICC 2022-IEEE International Conference on Communications, pages 2828–2833. IEEE, 2022.
- Communication-efficient online federated learning framework for nonlinear regression. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5228–5232. IEEE, 2022.
- Google. Your voice amp; audio data stays private while google assistant improves, 2023. URL https://support.google.com/assistant/answer/10176224?hl=en.
- Karen Hao. How apple personalizes siri without hoovering up your data. Technology Review, 2020.
- Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604, 2018.
- Florian Hartmann. Predicting text selections with federated learning, Nov 2021. URL https://ai.googleblog.com/2021/11/predicting-text-selections-with.html.
- Elad Hazan. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
- A simple and provably efficient algorithm for asynchronous federated contextual linear bandits. In Advances in Neural Information Processing Systems, 2022.
- Federated linear contextual bandits. Advances in Neural Information Processing Systems, 34:27057–27068, 2021.
- Advances and open problems in federated learning. corr. arXiv preprint arXiv:1912.04977, 2019.
- A payload optimization method for federated recommender systems. In Fifteenth ACM Conference on Recommender Systems, pages 432–442, 2021.
- A unified theory of decentralized sgd with changing topology and local updates. In International Conference on Machine Learning, pages 5381–5393. PMLR, 2020.
- Anthony Kuh. Real time kernel learning for sensor networks using principles of federated learning. In 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 2089–2093. IEEE, 2021.
- Asynchronous upper confidence bound algorithms for federated linear bandits. In International Conference on Artificial Intelligence and Statistics, pages 6529–6553. PMLR, 2022.
- Fedrec++: Lossless federated recommendation with explicit feedback. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 4224–4231, 2021.
- Communication-efficient learning of deep networks from decentralized data (2016). arXiv preprint arXiv:1602.05629, 2016.
- Online federated learning. In 2021 60th IEEE Conference on Decision and Control (CDC), pages 4083–4090. IEEE, 2021.
- Arkadi Nemirovski. Efficient methods in convex programming. Lecture notes, 1994.
- Yurii Nesterov. Primal-dual subgradient methods for convex problems. Mathematical programming, 120(1):221–259, 2009.
- Deep federated learning for autonomous driving. In 2022 IEEE Intelligent Vehicles Symposium (IV), pages 1824–1830. IEEE, 2022.
- Towards optimal communication complexity in distributed non-convex optimization. In Advances in Neural Information Processing Systems, 2022.
- Improved regret guarantees for online smooth convex optimization with bandit feedback. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 636–642. JMLR Workshop and Conference Proceedings, 2011.
- Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. The Journal of Machine Learning Research, 18(1):1703–1713, 2017.
- Communication-efficient distributed optimization using an approximate newton-type method. In International conference on machine learning, pages 1000–1008. PMLR, 2014.
- Federated multi-armed bandits with personalization. In International Conference on Artificial Intelligence and Statistics, pages 2917–2925. PMLR, 2021.
- Optimistic rates for learning with a smooth loss. arXiv preprint arXiv:1009.3896, 2010.
- Sebastian U Stich. Local sgd converges fast and communicates little. arXiv preprint arXiv:1805.09767, 2018.
- Distributed bandit learning: Near-optimal regret with efficient communication. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SJxZnR4YvB.
- Is local sgd better than minibatch sgd? In International Conference on Machine Learning, pages 10334–10343. PMLR, 2020a.
- An even more optimal stochastic optimization algorithm: minibatching and interpolation learning. Advances in Neural Information Processing Systems, 34:7333–7345, 2021.
- Graph oracle models, lower bounds, and gaps for parallel stochastic optimization. Advances in neural information processing systems, 31, 2018.
- Minibatch vs local sgd for heterogeneous distributed learning. Advances in Neural Information Processing Systems, 33:6281–6292, 2020b.
- The min-max complexity of distributed stochastic convex optimization with intermittent communication. In Conference on Learning Theory, pages 4386–4437. PMLR, 2021.
- Federated composite optimization. In International Conference on Machine Learning, pages 12253–12266. PMLR, 2021.
- Information-theoretic lower bounds for distributed statistical estimation with communication constraints. Advances in Neural Information Processing Systems, 26, 2013.
- Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates. The Journal of Machine Learning Research, 16(1):3299–3340, 2015.
- Parallelized stochastic gradient descent. Advances in neural information processing systems, 23, 2010.