Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence (2401.03489v1)
Abstract: In Federated Reinforcement Learning (FRL), agents aim to collaboratively learn a common task, while each agent is acting in its local environment without exchanging raw trajectories. Existing approaches for FRL either (a) do not provide any fault-tolerance guarantees (against misbehaving agents), or (b) rely on a trusted central agent (a single point of failure) for aggregating updates. We provide the first decentralized Byzantine fault-tolerant FRL method. Towards this end, we first propose a new centralized Byzantine fault-tolerant policy gradient (PG) algorithm that improves over existing methods by relying only on assumptions standard for non-fault-tolerant PG. Then, as our main contribution, we show how a combination of robust aggregation and Byzantine-resilient agreement methods can be leveraged in order to eliminate the need for a trusted central entity. Since our results represent the first sample complexity analysis for Byzantine fault-tolerant decentralized federated non-convex optimization, our technical contributions may be of independent interest. Finally, we corroborate our theoretical results experimentally for common RL environments, demonstrating the speed-up of decentralized federations w.r.t. the number of participating agents and resilience against various Byzantine attacks.
- Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
- Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine, 37(3):50–60, 2020.
- Federated reinforcement learning: Techniques, applications, and open challenges. arXiv preprint arXiv:2108.11887, 2021.
- Federated deep reinforcement learning. arXiv preprint arXiv:1901.08277, 2019.
- Fault-tolerant federated reinforcement learning with theoretical guarantee. In Advances in Neural Information Processing Systems, volume 34, pages 1007–1021, 2021.
- Fedhql: Federated heterogeneous q-learning. arXiv:2301.11135, 2023.
- Mdpgt: momentum-based decentralized policy gradient tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 9377–9385, 2022.
- Collaborative learning in the jungle (decentralized, byzantine, heterogeneous, asynchronous and nonconvex learning). Advances in Neural Information Processing Systems, 34:25044–25057, 2021.
- Reinforcement learning: An introduction. MIT press, 2018.
- Trust region policy optimization. In International conference on machine learning, pages 1889–1897, 2015.
- Proximal policy optimization algorithms. arXiv:1707.06347, 2017.
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
- Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319–350, 2001.
- Stochastic variance reduction for nonconvex optimization. In International conference on machine learning, pages 314–323, 2016.
- Matteo Papini. Safe policy optimization. 2021.
- An improved convergence analysis of stochastic variance-reduced policy gradient. In Uncertainty in Artificial Intelligence, pages 541–551. PMLR, 2020.
- Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In International conference on machine learning, pages 6286–6295, 2021.
- Page-pg: A simple and loopless variance-reduced policy gradient method with probabilistic gradient estimation. In International Conference on Machine Learning, pages 7223–7240, 2022.
- The byzantine generals problem. ACM Trans. Program. Lang. Syst., 4(3), 1982.
- Machine learning with adversaries: Byzantine tolerant gradient descent. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 118–128, 2017a.
- Byzantine stochastic gradient descent. Advances in Neural Information Processing Systems, 31, 2018.
- Byzantine-resilient non-convex stochastic gradient descent. arXiv:2012.14368, 2020.
- Byzantine machine learning made easy by resilient averaging of momentums. In International Conference on Machine Learning, pages 6246–6283, 2022.
- Variance reduction is an antidote to byzantines: Better rates, weaker assumptions and communication compression as a cherry on the top. In The Eleventh International Conference on Learning Representations, 2023.
- Multidimensional approximate agreement in byzantine asynchronous systems. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 391–400, 2013.
- Multidimensional agreement in byzantine systems. Distributed Computing, 28(6):423–441, 2015.
- Stochastic variance-reduced policy gradient. In International conference on machine learning, pages 4026–4035, 2018.
- Stochastic recursive momentum for policy gradient methods. arXiv preprint arXiv:2003.04302, 2020.
- Learning from history for byzantine robust optimization. In International Conference on Machine Learning, pages 5311–5319, 2021.
- C Cachin and V Shoup. Random oracles in constantinople: Practical asynchronous byzantine agreement using. In Proceedings of the 19th ACM Symposium on Principles of Distributed Computing, no, pages 1–26, 2000.
- Neuronlike adaptive elements that can solve difficult learning control problems. IEEE transactions on systems, man, and cybernetics, pages 834–846, 1983.
- Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems, 26, 2013.
- Byzantine-robust learning on heterogeneous datasets via bucketing. In International Conference on Learning Representations, 2022.
- Machine learning with adversaries: Byzantine tolerant gradient descent. Advances in neural information processing systems, 30, 2017b.
- Robust aggregation for federated learning. IEEE Transactions on Signal Processing, 70:1142–1154, 2022.
- Endre Weiszfeld. Sur le point pour lequel la somme des distances de n points donnés est minimum. Tohoku Mathematical Journal, First Series, 43:355–386, 1937.
- Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
- Philip Jordan (2 papers)
- Florian Grötschla (22 papers)
- Flint Xiaofeng Fan (11 papers)
- Roger Wattenhofer (212 papers)