Momentum-Based Federated Reinforcement Learning with Interaction and Communication Efficiency (2405.17471v2)
Abstract: Federated Reinforcement Learning (FRL) has garnered increasing attention recently. However, due to the intrinsic spatio-temporal non-stationarity of data distributions, the current approaches typically suffer from high interaction and communication costs. In this paper, we introduce a new FRL algorithm, named $\texttt{MFPO}$, that utilizes momentum, importance sampling, and additional server-side adjustment to control the shift of stochastic policy gradients and enhance the efficiency of data utilization. We prove that by proper selection of momentum parameters and interaction frequency, $\texttt{MFPO}$ can achieve $\tilde{\mathcal{O}}(H N{-1}\epsilon{-3/2})$ and $\tilde{\mathcal{O}}(\epsilon{-1})$ interaction and communication complexities ($N$ represents the number of agents), where the interaction complexity achieves linear speedup with the number of agents, and the communication complexity aligns the best achievable of existing first-order FL algorithms. Extensive experiments corroborate the substantial performance gains of $\texttt{MFPO}$ over existing methods on a suite of complex and high-dimensional benchmarks.
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. AISTATS. PMLR, 2017, pp. 1273–1282.
- D. Xu, T. Li, Y. Li, X. Su, S. Tarkoma, T. Jiang, J. Crowcroft, and P. Hui, “Edge intelligence: Empowering intelligence to the edge of network,” Proc. IEEE, vol. 109, no. 11, pp. 1778–1837, 2021.
- Z. Zhang, S. Yue, and J. Zhang, “Towards resource-efficient edge ai: From federated learning to semi-supervised model personalization,” IEEE Transactions on Mobile Computing, 2023.
- B. Liu, L. Wang, and M. Liu, “Lifelong federated reinforcement learning: a learning architecture for navigation in cloud robotic systems,” IEEE Rob. Autom. Lett., vol. 4, no. 4, pp. 4555–4562, 2019.
- C. Nadiger, A. Kumar, and S. Abdelhak, “Federated reinforcement learning for fast personalization,” in Proc. AIKE. IEEE, 2019, pp. 123–127.
- H. H. Zhuo, W. Feng, Y. Lin, Q. Xu, and Q. Yang, “Federated deep reinforcement learning,” 2020.
- X. Fan, Y. Ma, Z. Dai, W. Jing, C. Tan, and B. K. H. Low, “Fault-tolerant federated reinforcement learning with theoretical guarantee,” in Proc. NeurIPS, vol. 34. Curran Associates, Inc., 2021, pp. 1007–1021.
- S. Khodadadian, P. Sharma, G. Joshi, and S. T. Maguluri, “Federated reinforcement learning: Linear speedup under markovian sampling,” in Proc. ICML, vol. 162. PMLR, 2022, pp. 10 997–11 057.
- X. Liang, Y. Liu, T. Chen, M. Liu, and Q. Yang, “Federated transfer reinforcement learning for autonomous driving,” arXiv preprint arXiv:1910.06001, 2019.
- S. Yu, X. Chen, Z. Zhou, X. Gong, and D. Wu, “When deep reinforcement learning meets federated learning: Intelligent multitimescale resource management for multiaccess edge computing in 5g ultradense network,” IEEE Internet Things J., vol. 8, no. 4, pp. 2238–2251, 2020.
- H.-K. Lim, J.-B. Kim, J.-S. Heo, and Y.-H. Han, “Federated reinforcement learning for training control policies on multiple iot devices,” Sensors, vol. 20, no. 5, p. 1359, 2020.
- H. Yang, M. Fang, and J. Liu, “Achieving linear speedup with partial worker participation in non-IID federated learning,” in Proc. ICLR, 2021.
- A. Anwar and A. Raychowdhury, “Multi-task federated reinforcement learning with adversaries,” arXiv preprint arXiv:2103.06473, 2021.
- R. Johnson and T. Zhang, “Accelerating stochastic gradient descent using predictive variance reduction,” Proc. NeurIPS, vol. 26, 2013.
- S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “SCAFFOLD: Stochastic controlled averaging for federated learning,” in Proc. ICML, vol. 119. PMLR, 2020, pp. 5132–5143.
- P. Khanduri, P. SHARMA, H. Yang, M. Hong, J. Liu, K. Rajawat, and P. Varshney, “Stem: A stochastic two-sided momentum algorithm achieving near-optimal sample and communication complexities for federated learning,” in Proc. NeurIPS, vol. 34, 2021, pp. 6050–6061.
- Y. Zhang, J. Ren, J. Liu, C. Xu, H. Guo, and Y. Liu, “A survey on emerging computing paradigms for big data,” Chin. J. Electron., vol. 26, no. 1, pp. 1–12, 2017.
- Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge intelligence: Paving the last mile of artificial intelligence with edge computing,” Proc. IEEE, vol. 107, no. 8, pp. 1738–1762, 2019.
- Z. Wang, K. Liu, J. Hu, J. Ren, H. Guo, and W. Yuan, “Attrleaks on the edge: Exploiting information leakage from privacy-preserving co-inference,” Chin. J. Electron., vol. 32, no. 1, pp. 1–12, 2023.
- H. Hasselt, “Double q-learning,” Proc. NeurIPS, vol. 23, pp. 2613–2621, 2010.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
- H. Cha, J. Park, H. Kim, S.-L. Kim, and M. Bennis, “Federated reinforcement distillation with proxy experience memory,” arXiv preprint arXiv:1907.06536, 2019.
- C. J. Watkins and P. Dayan, “Q-learning,” Mach. learn., vol. 8, pp. 279–292, 1992.
- D. J. Mankowitz, G. Dulac-Arnold, and T. Hester, “Challenges of real-world reinforcement learning,” in ICML Workshop on Real-Life Reinforcement Learn., 2019.
- R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Reinforcement learning, pp. 5–32, 1992.
- J. Baxter and P. L. Bartlett, “Infinite-horizon policy-gradient estimation,” J. of Artificial Intelligence Res., vol. 15, pp. 319–350, 2001.
- A. Agarwal, N. Jiang, S. M. Kakade, and W. Sun, “Reinforcement learning: Theory and algorithms,” Tech. Rep., 2019.
- H. Yu, R. Jin, and S. Yang, “On the linear speedup analysis of communication efficient momentum sgd for distributed non-convex optimization,” in Proc. ICML. PMLR, 2019, pp. 7184–7193.
- M. Papini, D. Binaghi, G. Canonaco, M. Pirotta, and M. Restelli, “Stochastic variance-reduced policy gradient,” in Proc. ICML. PMLR, 2018, pp. 4026–4035.
- P. Xu, F. Gao, and Q. Gu, “An improved convergence analysis of stochastic variance-reduced policy gradient,” in Uncertainty in Artif. Intell. PMLR, 2020, pp. 541–551.
- Y. Drori and O. Shamir, “The complexity of finding stationary points with stochastic gradient descent,” in Proc. ICML. PMLR, 2020, pp. 2658–2667.
- C. Cortes, Y. Mansour, and M. Mohri, “Learning bounds for importance weighting,” in Proc. NeurIPS, vol. 23. Curran Associates, Inc., 2010.
- G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” 2016.
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proc. ICML. PMLR, 2018, pp. 1861–1870.
- Sheng Yue (13 papers)
- Xingyuan Hua (4 papers)
- Lili Chen (34 papers)
- Ju Ren (33 papers)