Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Global Convergence Guarantees for Federated Policy Gradient Methods with Adversaries (2403.09940v2)

Published 15 Mar 2024 in cs.LG, cs.AI, and math.OC

Abstract: Federated Reinforcement Learning (FRL) allows multiple agents to collaboratively build a decision making policy without sharing raw trajectories. However, if a small fraction of these agents are adversarial, it can lead to catastrophic results. We propose a policy gradient based approach that is robust to adversarial agents which can send arbitrary values to the server. Under this setting, our results form the first global convergence guarantees with general parametrization. These results demonstrate resilience with adversaries, while achieving optimal sample complexity of order $\tilde{\mathcal{O}}\left( \frac{1}{N\epsilon2} \left( 1+ \frac{f2}{N}\right)\right)$, where $N$ is the total number of agents and $f<N/2$ is the number of adversarial agents.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. i-sim2real: Reinforcement learning of robotic policies in tight human-robot interaction loops. In Conference on Robot Learning, pages 212–224. PMLR, 2023.
  2. On the theory of policy gradient methods: Optimality, approximation, and distribution shift. The Journal of Machine Learning Research, 22(1):4431–4506, 2021a.
  3. Communication efficient parallel reinforcement learning. In Uncertainty in Artificial Intelligence, 2021b.
  4. Deeppool: Distributed model-free algorithm for ride-sharing using deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 20(12):4714–4727, 2019.
  5. Byzantine stochastic gradient descent. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 4618–4628, 2018.
  6. Byzantine-resilient non-convex stochastic gradient descent. In International Conference on Learning Representations, 2020.
  7. Achieving zero constraint violation for constrained reinforcement learning via primal-dual approach. Proceedings of the AAAI Conference on Artificial Intelligence, 36:3682–3689, Jun. 2022. doi: 10.1609/aaai.v36i4.20281.
  8. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319–350, 2001.
  9. Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017a.
  10. Machine learning with adversaries: Byzantine tolerant gradient descent. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 119–129. Curran Associates, Inc., 2017b.
  11. Decision making in monopoly using a hybrid deep reinforcement learning approach. IEEE Transactions on Emerging Topics in Computational Intelligence, 6(6):1335–1344, 2022.
  12. Openai gym. CoRR, abs/1606.01540, 2016.
  13. A hybrid deep reinforcement learning approach for jointly optimizing offloading and resource management in vehicular networks. IEEE Transactions on Vehicular Technology, 2023a.
  14. Option-aware adversarial inverse reinforcement learning for robotic control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5902–5908. IEEE, 2023b.
  15. Byzantine-robust online and offline distributed reinforcement learning. In Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, editors, Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine Learning Research, pages 3230–3269. PMLR, 2023c.
  16. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 1(2):1–25, 2017.
  17. Sever: A robust meta-algorithm for stochastic optimization. In International Conference on Machine Learning, pages 1596–1606. PMLR, 2019.
  18. The hidden vulnerability of distributed learning in Byzantium. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3521–3530. PMLR, 10–15 Jul 2018a.
  19. The hidden vulnerability of distributed learning in byzantium, 2018b.
  20. Fault-tolerant federated reinforcement learning with theoretical guarantee. Advances in Neural Information Processing Systems, 34:1007–1021, 2021.
  21. Byzantine machine learning made easy by resilient averaging of momentums. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 6246–6283. PMLR, 17–23 Jul 2022a.
  22. Byzantine machine learning made easy by resilient averaging of momentums. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 6246–6283. PMLR, 2022b.
  23. Stochastic policy gradient methods: Improved sample complexity for Fisher-non-degenerate policies. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 9827–9869. PMLR, 23–29 Jul 2023.
  24. Asap: A semi-autonomous precise system for telesurgery during communication delays. IEEE Transactions on Medical Robotics and Bionics, 5(1):66–78, 2023.
  25. Thomas P. Hayes. A large-deviation inequality for vector-valued martingales. 2003.
  26. From federated to fog learning: Distributed machine learning over heterogeneous wireless networks. IEEE Communications Magazine, 58(12):41–47, 2020. doi: 10.1109/MCOM.001.2000410.
  27. Federated reinforcement learning with environment heterogeneity. In International Conference on Artificial Intelligence and Statistics, 2022.
  28. Advances and open problems in federated learning. 2019.
  29. Federated reinforcement learning: Linear speedup under Markovian sampling. In International Conference on Machine Learning, 2022.
  30. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021.
  31. The byzantine generals problem. ACM Trans. Program. Lang. Syst., 4(3):382–401, 1982.
  32. Byzantine-robust federated deep deterministic policy gradient. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4013–4017. IEEE, 2022.
  33. An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods. Advances in Neural Information Processing Systems, 33:7624–7636, 2020.
  34. Stochastic second-order methods improve best-known sample complexity of SGD for gradient-dominated functions. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  35. Improved sample complexity analysis of natural policy gradient algorithm with general parameterization for infinite horizon discounted reward markov decision processes, 2023.
  36. Stochastic variance-reduced policy gradient. In Proceedings of the 35th International Conference on Machine Learning, pages 4026–4035, 2018.
  37. Robust estimation via robust gradient estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(3):601–627, 2020.
  38. Explainable ai-based federated deep reinforcement learning for trusted autonomous driving. In 2022 International Wireless Communications and Mobile Computing (IWCMC), pages 318–323. IEEE, 2022.
  39. Peter J Rousseeuw. Multivariate estimation with high breakdown point. Mathematical statistics and applications, 8(37):283–297, 1985.
  40. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
  41. Hessian aided policy gradient. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 5729–5738. PMLR, 09–15 Jun 2019.
  42. Mujoco: A physics engine for model-based control. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012.
  43. Federated temporal difference learning with linear function approximation under environmental heterogeneity. arXiv preprint arXiv:2302.02212, 2023a.
  44. Model-free learning with heterogeneous dynamical systems: A federated LQR approach. arXiv preprint arXiv:2308.11743, 2023b.
  45. Neural policy gradient methods: Global optimality and rates of convergence. In International Conference on Learning Representations, 2019.
  46. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3-4):229–256, 1992.
  47. Generalized byzantine-tolerant sgd, 2018a.
  48. Generalized byzantine-tolerant sgd, 2018b.
  49. FedKL: Tackling data heterogeneity in federated reinforcement learning by penalizing KL divergence. IEEE Journal on Selected Areas in Communications, 41(4):1227–1242, 2023.
  50. Signguard: Byzantine-robust federated learning through collaborative malicious gradient filtering. arXiv preprint arXiv:2109.05872, 2021.
  51. An improved convergence analysis of stochastic variance-reduced policy gradient. arXiv preprint arXiv:1905.12615, 2019a.
  52. Sample efficient policy gradient methods with recursive variance reduction. In International Conference on Learning Representations, 2019b.
  53. Byzantine-robust distributed learning: Towards optimal statistical rates. In International Conference on Machine Learning, pages 5650–5659. PMLR, 2018.
  54. Reinforcement learning in healthcare: A survey. ACM Computing Surveys (CSUR), 55(1):1–36, 2021.
  55. A general sample complexity analysis of vanilla policy gradient. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 3332–3380. PMLR, 28–30 Mar 2022.
  56. On the convergence and sample efficiency of variance-reduced policy gradient method. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
  57. Resilient mechanism against byzantine failure for distributed deep reinforcement learning. In 2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE), pages 378–389. IEEE, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Swetha Ganesh (9 papers)
  2. Jiayu Chen (51 papers)
  3. Gugan Thoppe (26 papers)
  4. Vaneet Aggarwal (222 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets