Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Finite-Time Analysis of On-Policy Heterogeneous Federated Reinforcement Learning (2401.15273v2)

Published 27 Jan 2024 in cs.LG, cs.SY, eess.SY, and math.OC

Abstract: Federated reinforcement learning (FRL) has emerged as a promising paradigm for reducing the sample complexity of reinforcement learning tasks by exploiting information from different agents. However, when each agent interacts with a potentially different environment, little to nothing is known theoretically about the non-asymptotic performance of FRL algorithms. The lack of such results can be attributed to various technical challenges and their intricate interplay: Markovian sampling, linear function approximation, multiple local updates to save communication, heterogeneity in the reward functions and transition kernels of the agents' MDPs, and continuous state-action spaces. Moreover, in the on-policy setting, the behavior policies vary with time, further complicating the analysis. In response, we introduce FedSARSA, a novel federated on-policy reinforcement learning scheme, equipped with linear function approximation, to address these challenges and provide a comprehensive finite-time error analysis. Notably, we establish that FedSARSA converges to a policy that is near-optimal for all agents, with the extent of near-optimality proportional to the level of heterogeneity. Furthermore, we prove that FedSARSA leverages agent collaboration to enable linear speedups as the number of agents increases, which holds for both fixed and adaptive step-size configurations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. E. Barnard. Temporal-difference methods and Markov models. IEEE Transactions on Systems, Man, and Cybernetics, 23(2):357–365, March 1993. ISSN 2168-2909. doi: 10.1109/21.229449.
  2. Neuro-Dynamic Programming. Athena Scientific, 1996.
  3. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation, November 2018.
  4. Optimization methods for large-scale machine learning. SIAM review, 60(2):223–311, 2018.
  5. Convergence and accuracy trade-offs in federated learning and meta-learning. In International Conference on Artificial Intelligence and Statistics, pp.  2575–2583. PMLR, 2021.
  6. On the existence of fixed points for approximate value iteration and temporal-difference learning. Journal of Optimization theory and Applications, 105:589–608, 2000.
  7. Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation for Multi-Agent Reinforcement Learning, June 2019.
  8. Fault-tolerant federated reinforcement learning with theoretical guarantee. Advances in Neural Information Processing Systems, 34:1007–1021, 2021.
  9. On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning, August 2018. URL http://arxiv.org/abs/1704.00805. arXiv:1704.00805 [cs, math].
  10. Geoffrey J. Gordon. Chattering in SARSA (λ𝜆\lambdaitalic_λ). CMU Learning Lab Technical Report, 1996.
  11. Geoffrey J. Gordon. Reinforcement learning with function approximation converges to a region. Advances in neural information processing systems, 13, 2000.
  12. Federated reinforcement learning with environment heterogeneity. In International Conference on Artificial Intelligence and Statistics, pp.  18–37. PMLR, 2022.
  13. Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pp. 5132–5143. PMLR, 2020.
  14. First analysis of local gd on heterogeneous data. arXiv preprint arXiv:1909.04715, 2019.
  15. Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling, June 2022.
  16. A unified theory of decentralized SGD with changing topology and local updates. arXiv preprint arXiv:2003.10422, 2020.
  17. On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189, 2019.
  18. Federated transfer reinforcement learning for autonomous driving. In Federated and Transfer Learning, pp.  357–371. Springer, 2022.
  19. Federated reinforcement learning for training control policies on multiple IoT devices. Sensors, 20(5):1359, 2020.
  20. Lifelong federated reinforcement learning: A learning architecture for navigation in cloud robotic systems. IEEE Robotics and Automation Letters, 4(4):4555–4562, 2019.
  21. Distributed td (0) with almost no communication. IEEE Control Systems Letters, 2023.
  22. Communication-Efficient Learning of Deep Networks from Decentralized Data, January 2023.
  23. An analysis of reinforcement learning with function approximation. In Proceedings of the 25th International Conference on Machine Learning - ICML ’08, pp.  664–671, Helsinki, Finland, 2008. ACM Press. ISBN 978-1-60558-205-4. doi: 10.1145/1390156.1390240.
  24. Markov Chains and Stochastic Stability. Springer Science & Business Media, 2012.
  25. ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! arXiv preprint arXiv:2202.09357, 2022.
  26. Linear convergence in federated learning: Tackling client heterogeneity and sparse gradients. Advances in Neural Information Processing Systems, 34:14606–14619, 2021.
  27. A. Yu. Mitrophanov. Sensitivity and Convergence of Uniformly Ergodic Markov Chains. Journal of Applied Probability, 42(4):1003–1014, 2005. ISSN 0021-9002.
  28. Federated reinforcement learning for fast personalization. In 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), pp. 123–127. IEEE, 2019.
  29. FedSplit: An algorithmic framework for fast federated optimization. arXiv preprint arXiv:2005.05238, 2020.
  30. A convergent form of approximate policy iteration. Advances in neural information processing systems, 15, 2002.
  31. On the existence of fixed points for Q-learning and Sarsa in partially observable domains. In ICML, pp.  490–497, 2002.
  32. Federated Reinforcement Learning: Techniques, Applications, and Open Challenges, October 2021.
  33. On finite-time convergence of actor-critic algorithm. IEEE Journal on Selected Areas in Information Theory, 2(2):652–664, 2021.
  34. Finite-time analysis of asynchronous stochastic approximation and Q -learning. In Conference on Learning Theory, pp.  3185–3205. PMLR, 2020.
  35. On-Line Q-learning Using Connectionist Systems, volume 37. University of Cambridge, Department of Engineering Cambridge, UK, 1994.
  36. On the convergence of federated optimization in heterogeneous networks. arXiv preprint arXiv:1812.06127, 3, 2018.
  37. Towards understanding asynchronous advantage actor-critic: Convergence and linear speedup. IEEE Transactions on Signal Processing, 2023.
  38. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine learning, 38:287–308, 2000.
  39. Reinforcement learning with replacing eligibility traces. Machine learning, 22(1-3):123–158, 1996.
  40. Finite-time error bounds for linear stochastic approximation andtd learning. In Conference on Learning Theory, pp.  2803–2830. PMLR, 2019.
  41. Reinforcement Learning: An Introduction. MIT press, 2018.
  42. J.N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automat. Contr., 42(5):674–690, May 1997. ISSN 00189286. doi: 10.1109/9.580874.
  43. Federated Temporal Difference Learning with Linear Function Approximation under Environmental Heterogeneity, February 2023.
  44. Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in Neural Information Processing Systems, 33, 2020.
  45. In-edge ai: Intelligentizing mobile edge computing, caching and communication by federated learning. Ieee Network, 33(5):156–165, 2019.
  46. Q-learning. Machine learning, 8:279–292, 1992.
  47. The Blessing of Heterogeneity in Federated Q-learning: Linear Speedup and Beyond. arXiv preprint arXiv:2305.10697, 2023.
  48. Minibatch vs local SGD for heterogeneous distributed learning. Advances in Neural Information Processing Systems, 33:6281–6292, 2020.
  49. Fedkl: Tackling data heterogeneity in federated reinforcement learning by penalizing kl divergence. IEEE Journal on Selected Areas in Communications, 41(4):1227–1242, 2023.
  50. Fuzhen Zhang. Matrix Theory: Basic Results and Techniques. Springer, 2011.
  51. On the Chattering of SARSA with Linear Function Approximation, February 2022.
  52. Federated deep reinforcement learning. arXiv preprint arXiv:1901.08277, 2019.
  53. Finite-sample analysis for sarsa with linear function approximation. Advances in neural information processing systems, 32, 2019.
Citations (14)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets