Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis (2404.08003v4)
Abstract: To improve the efficiency of reinforcement learning (RL), we propose a novel asynchronous federated reinforcement learning (FedRL) framework termed AFedPG, which constructs a global model through collaboration among $N$ agents using policy gradient (PG) updates. To address the challenge of lagged policies in asynchronous settings, we design a delay-adaptive lookahead technique \textit{specifically for FedRL} that can effectively handle heterogeneous arrival times of policy gradients. We analyze the theoretical global convergence bound of AFedPG, and characterize the advantage of the proposed algorithm in terms of both the sample complexity and time complexity. Specifically, our AFedPG method achieves $O(\frac{{\epsilon}{-2.5}}{N})$ sample complexity for global convergence at each agent on average. Compared to the single agent setting with $O(\epsilon{-2.5})$ sample complexity, it enjoys a linear speedup with respect to the number of agents. Moreover, compared to synchronous FedPG, AFedPG improves the time complexity from $O(\frac{t_{\max}}{N})$ to $O({\sum_{i=1}{N} \frac{1}{t_{i}}}){-1}$, where $t_{i}$ denotes the time consumption in each iteration at agent $i$, and $t_{\max}$ is the largest one. The latter complexity $O({\sum_{i=1}{N} \frac{1}{t_{i}}}){-1}$ is always smaller than the former one, and this improvement becomes significant in large-scale federated settings with heterogeneous computing powers ($t_{\max}\gg t_{\min}$). Finally, we empirically verify the improved performance of AFedPG in four widely-used MuJoCo environments with varying numbers of agents. We also demonstrate the advantages of AFedPG in various computing heterogeneity scenarios.
- On the theory of policy gradient methods: Optimality, approximation, and distribution shift. Journal of Machine Learning Research 22, 1 (2021), 4431–4506.
- Communication efficient parallel reinforcement learning. In Uncertainty in Artificial Intelligence, Vol. 161. PMLR, 247–256.
- DeepPool: Distributed model-free algorithm for ride-sharing using deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems 20, 12 (2019), 4714–4727.
- VAFL: a method of vertical asynchronous federated learning. arXiv preprint arXiv:2007.06081 (2020).
- Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 8378–8390.
- On the Global Optimum Convergence of Momentum-based Policy Gradient. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 151). PMLR, 1910–1934.
- Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning 110, 9 (2021), 2419–2468.
- Efficient and Light-Weight Federated Learning via Asynchronous Distributed Dropout. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, Vol. 206. PMLR, 6630–6660.
- Submodel partitioning in hierarchical federated learning: Algorithm design and convergence analysis. arXiv preprint arXiv:2310.17890 (2023).
- Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies. In International Conference on Machine Learning. PMLR, 9827–9869.
- A distributed model-free ride-sharing approach for joint matching, pricing, and dispatching using deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems 22, 12 (2021), 7931–7942.
- Federated Split Learning With Joint Personalization-Generalization for Inference-Stage Optimization in Wireless Edge Networks. IEEE Transactions on Mobile Computing (2023).
- From Federated to Fog Learning: Distributed Machine Learning over Heterogeneous Wireless Networks. IEEE Communications Magazine 58, 12 (2020), 41–47.
- Multi-User Delay-Constrained Scheduling With Deep Recurrent Reinforcement Learning. IEEE/ACM Transactions on Networking (2024), 1–16.
- Effective multi-user delay-constrained scheduling with deep recurrent reinforcement learning. In Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing. 1–10.
- Momentum-based policy gradient methods. In International conference on machine learning. PMLR, 4422–4433.
- Federated reinforcement learning with environment heterogeneity. In International Conference on Artificial Intelligence and Statistics, Vol. 151. PMLR, 18–37.
- Advances and Open Problems in Federated Learning. Foundations and Trends® in Machine Learning 14, 1–2 (2021), 1–210.
- Federated reinforcement learning: Linear speedup under Markovian sampling. In International Conference on Machine Learning, Vol. 162. PMLR, 10997–11057.
- Deep Reinforcement Learning for Autonomous Driving: A Survey. IEEE Transactions on Intelligent Transportation Systems 23, 6 (2022), 4909–4926.
- Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning. In Advances in Neural Information Processing Systems, Vol. 35. PMLR, 17202–17215.
- Exploration in deep reinforcement learning: A survey. Information Fusion 85 (2022), 1–22.
- Communication-efficient Federated Learning for Resource-constrained Edge Devices. IEEE Transactions on Machine Learning in Communications and Networking 1 (2023), 210–224.
- Improved Communication Efficiency in Federated Natural Policy Gradient via ADMM-based Gradient Updates. In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA.
- ElegantRL-Podracer: Scalable and elastic library for cloud-native deep reinforcement learning. In Advances in Neural Information Processing Systems Workshop on Deep Reinforcement Learning.
- An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 7624–7636.
- Communication-efficient learning of deep networks from decentralized data. In International Conference on Artificial Intelligence and Statistics, Vol. 54. PMLR, 1273–1282.
- Asynchronous SGD beats minibatch SGD under arbitrary delays. In Advances in Neural Information Processing Systems, Vol. 35. PMLR, 420–433.
- Asynchronous Methods for Deep Reinforcement Learning. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48). PMLR, New York, New York, USA, 1928–1937.
- Washim Uddin Mondal and Vaneet Aggarwal. 2024. Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes. arXiv:2310.11677
- A survey on security and privacy of federated learning. Future Generation Computer Systems 115 (2021), 619–640.
- Federated learning for internet of things: A comprehensive survey. IEEE Communications Surveys & Tutorials 23, 3 (2021), 1622–1658.
- Federated learning for wireless communications: Motivation, opportunities, and challenges. IEEE Communications Magazine 58, 6 (2020), 46–51.
- OpenAI. 2023. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774 (2023).
- Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, Vol. 35. Curran Associates, Inc., 27730–27744.
- Stochastic Variance-Reduced Policy Gradient. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80. PMLR, 4026–4035.
- PyTorch: An Imperative Style, High-performance Deep Learning Library. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc.
- Distributed learning in wireless sensor networks. IEEE Signal Processing Magazine 23, 4 (2006), 56–69.
- Foster Provost and Tom Fawcett. 2013. Data science and its relationship to big data and data-driven decision making. Big Data 1, 1 (2013), 51–59.
- Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research 22, 268 (2021), 1–8.
- High-Dimensional Continuous Control Using Generalized Advantage Estimation. arXiv preprint arXiv:1506.02438 (2018).
- Gemma Team and Google DeepMind. 2024. Gemma: Open Models Based on Gemini Research and Technology. arXiv preprint arXiv:2403.08295 (2024).
- MuJoCo: A physics engine for model-based control. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 5026–5033.
- Federated Temporal Difference Learning with Linear Function Approximation under Environmental Heterogeneity. arXiv preprint arXiv:2302.02212 (2023).
- Neural Policy Gradient Methods: Global Optimality and Rates of Convergence. In International Conference on Learning Representations.
- Sample Efficient Reinforcement Learning in Mixed Systems through Augmented Samples and Its Applications to Queueing Networks. In Advances in Neural Information Processing Systems, Vol. 36. Curran Associates, Inc., 2033–2055.
- Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8 (1992), 229–256.
- Asynchronous federated optimization. arXiv preprint arXiv:1903.03934 (2019).
- Zhijie Xie and Shenghui Song. 2023. FedKL: Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence. IEEE Journal on Selected Areas in Communications 41, 4 (2023), 1227–1242.
- Index-aware reinforcement learning for adaptive video streaming at the wireless edge. In Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing. 81–90.
- Sample Efficient Policy Gradient Methods with Recursive Variance Reduction. In International Conference on Learning Representations.
- Asynchronous Hybrid Reinforcement Learning for Latency and Reliability Optimization in the Metaverse Over Wireless Communications. IEEE Journal on Selected Areas in Communications 41, 7 (2023), 2138–2157.
- A general sample complexity analysis of vanilla policy gradient. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 151). PMLR, 3332–3380.
- Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM Journal on Control and Optimization 58, 6 (2020), 3586–3612.