Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Momentum-Based Federated Reinforcement Learning with Interaction and Communication Efficiency (2405.17471v2)

Published 24 May 2024 in cs.LG and cs.AI

Abstract: Federated Reinforcement Learning (FRL) has garnered increasing attention recently. However, due to the intrinsic spatio-temporal non-stationarity of data distributions, the current approaches typically suffer from high interaction and communication costs. In this paper, we introduce a new FRL algorithm, named $\texttt{MFPO}$, that utilizes momentum, importance sampling, and additional server-side adjustment to control the shift of stochastic policy gradients and enhance the efficiency of data utilization. We prove that by proper selection of momentum parameters and interaction frequency, $\texttt{MFPO}$ can achieve $\tilde{\mathcal{O}}(H N{-1}\epsilon{-3/2})$ and $\tilde{\mathcal{O}}(\epsilon{-1})$ interaction and communication complexities ($N$ represents the number of agents), where the interaction complexity achieves linear speedup with the number of agents, and the communication complexity aligns the best achievable of existing first-order FL algorithms. Extensive experiments corroborate the substantial performance gains of $\texttt{MFPO}$ over existing methods on a suite of complex and high-dimensional benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. AISTATS.   PMLR, 2017, pp. 1273–1282.
  2. D. Xu, T. Li, Y. Li, X. Su, S. Tarkoma, T. Jiang, J. Crowcroft, and P. Hui, “Edge intelligence: Empowering intelligence to the edge of network,” Proc. IEEE, vol. 109, no. 11, pp. 1778–1837, 2021.
  3. Z. Zhang, S. Yue, and J. Zhang, “Towards resource-efficient edge ai: From federated learning to semi-supervised model personalization,” IEEE Transactions on Mobile Computing, 2023.
  4. B. Liu, L. Wang, and M. Liu, “Lifelong federated reinforcement learning: a learning architecture for navigation in cloud robotic systems,” IEEE Rob. Autom. Lett., vol. 4, no. 4, pp. 4555–4562, 2019.
  5. C. Nadiger, A. Kumar, and S. Abdelhak, “Federated reinforcement learning for fast personalization,” in Proc. AIKE.   IEEE, 2019, pp. 123–127.
  6. H. H. Zhuo, W. Feng, Y. Lin, Q. Xu, and Q. Yang, “Federated deep reinforcement learning,” 2020.
  7. X. Fan, Y. Ma, Z. Dai, W. Jing, C. Tan, and B. K. H. Low, “Fault-tolerant federated reinforcement learning with theoretical guarantee,” in Proc. NeurIPS, vol. 34.   Curran Associates, Inc., 2021, pp. 1007–1021.
  8. S. Khodadadian, P. Sharma, G. Joshi, and S. T. Maguluri, “Federated reinforcement learning: Linear speedup under markovian sampling,” in Proc. ICML, vol. 162.   PMLR, 2022, pp. 10 997–11 057.
  9. X. Liang, Y. Liu, T. Chen, M. Liu, and Q. Yang, “Federated transfer reinforcement learning for autonomous driving,” arXiv preprint arXiv:1910.06001, 2019.
  10. S. Yu, X. Chen, Z. Zhou, X. Gong, and D. Wu, “When deep reinforcement learning meets federated learning: Intelligent multitimescale resource management for multiaccess edge computing in 5g ultradense network,” IEEE Internet Things J., vol. 8, no. 4, pp. 2238–2251, 2020.
  11. H.-K. Lim, J.-B. Kim, J.-S. Heo, and Y.-H. Han, “Federated reinforcement learning for training control policies on multiple iot devices,” Sensors, vol. 20, no. 5, p. 1359, 2020.
  12. H. Yang, M. Fang, and J. Liu, “Achieving linear speedup with partial worker participation in non-IID federated learning,” in Proc. ICLR, 2021.
  13. A. Anwar and A. Raychowdhury, “Multi-task federated reinforcement learning with adversaries,” arXiv preprint arXiv:2103.06473, 2021.
  14. R. Johnson and T. Zhang, “Accelerating stochastic gradient descent using predictive variance reduction,” Proc. NeurIPS, vol. 26, 2013.
  15. S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “SCAFFOLD: Stochastic controlled averaging for federated learning,” in Proc. ICML, vol. 119.   PMLR, 2020, pp. 5132–5143.
  16. P. Khanduri, P. SHARMA, H. Yang, M. Hong, J. Liu, K. Rajawat, and P. Varshney, “Stem: A stochastic two-sided momentum algorithm achieving near-optimal sample and communication complexities for federated learning,” in Proc. NeurIPS, vol. 34, 2021, pp. 6050–6061.
  17. Y. Zhang, J. Ren, J. Liu, C. Xu, H. Guo, and Y. Liu, “A survey on emerging computing paradigms for big data,” Chin. J. Electron., vol. 26, no. 1, pp. 1–12, 2017.
  18. Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge intelligence: Paving the last mile of artificial intelligence with edge computing,” Proc. IEEE, vol. 107, no. 8, pp. 1738–1762, 2019.
  19. Z. Wang, K. Liu, J. Hu, J. Ren, H. Guo, and W. Yuan, “Attrleaks on the edge: Exploiting information leakage from privacy-preserving co-inference,” Chin. J. Electron., vol. 32, no. 1, pp. 1–12, 2023.
  20. H. Hasselt, “Double q-learning,” Proc. NeurIPS, vol. 23, pp. 2613–2621, 2010.
  21. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  22. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
  23. H. Cha, J. Park, H. Kim, S.-L. Kim, and M. Bennis, “Federated reinforcement distillation with proxy experience memory,” arXiv preprint arXiv:1907.06536, 2019.
  24. C. J. Watkins and P. Dayan, “Q-learning,” Mach. learn., vol. 8, pp. 279–292, 1992.
  25. D. J. Mankowitz, G. Dulac-Arnold, and T. Hester, “Challenges of real-world reinforcement learning,” in ICML Workshop on Real-Life Reinforcement Learn., 2019.
  26. R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Reinforcement learning, pp. 5–32, 1992.
  27. J. Baxter and P. L. Bartlett, “Infinite-horizon policy-gradient estimation,” J. of Artificial Intelligence Res., vol. 15, pp. 319–350, 2001.
  28. A. Agarwal, N. Jiang, S. M. Kakade, and W. Sun, “Reinforcement learning: Theory and algorithms,” Tech. Rep., 2019.
  29. H. Yu, R. Jin, and S. Yang, “On the linear speedup analysis of communication efficient momentum sgd for distributed non-convex optimization,” in Proc. ICML.   PMLR, 2019, pp. 7184–7193.
  30. M. Papini, D. Binaghi, G. Canonaco, M. Pirotta, and M. Restelli, “Stochastic variance-reduced policy gradient,” in Proc. ICML.   PMLR, 2018, pp. 4026–4035.
  31. P. Xu, F. Gao, and Q. Gu, “An improved convergence analysis of stochastic variance-reduced policy gradient,” in Uncertainty in Artif. Intell.   PMLR, 2020, pp. 541–551.
  32. Y. Drori and O. Shamir, “The complexity of finding stationary points with stochastic gradient descent,” in Proc. ICML.   PMLR, 2020, pp. 2658–2667.
  33. C. Cortes, Y. Mansour, and M. Mohri, “Learning bounds for importance weighting,” in Proc. NeurIPS, vol. 23.   Curran Associates, Inc., 2010.
  34. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” 2016.
  35. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proc. ICML.   PMLR, 2018, pp. 1861–1870.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Sheng Yue (13 papers)
  2. Xingyuan Hua (4 papers)
  3. Lili Chen (34 papers)
  4. Ju Ren (33 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets