Papers
Topics
Authors
Recent
2000 character limit reached

The Sample-Communication Complexity Trade-off in Federated Q-Learning (2408.16981v2)

Published 30 Aug 2024 in cs.LG, math.OC, and stat.ML

Abstract: We consider the problem of federated Q-learning, where $M$ agents aim to collaboratively learn the optimal Q-function of an unknown infinite-horizon Markov decision process with finite state and action spaces. We investigate the trade-off between sample and communication complexities for the widely used class of intermittent communication algorithms. We first establish the converse result, where it is shown that a federated Q-learning algorithm that offers any speedup with respect to the number of agents in the per-agent sample complexity needs to incur a communication cost of at least an order of $\frac{1}{1-\gamma}$ up to logarithmic factors, where $\gamma$ is the discount factor. We also propose a new algorithm, called Fed-DVR-Q, which is the first federated Q-learning algorithm to simultaneously achieve order-optimal sample and communication complexities. Thus, together these results provide a complete characterization of the sample-communication complexity trade-off in federated Q-learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Gossip-based actor-learner architectures for deep reinforcement learning. In Proceedings of the 33rd Annual Conference on Neural Information Processing Systems, volume 32, 2019.
  2. Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model. Machine Learning, 91(3):325–349, 2013.
  3. Provably efficient Q-learning with low switching cost. In Advances in Neural Information Processing Systems, volume 32, 2019.
  4. C. Beck and R. Srikant. Error bounds for constant step-size Q-learning. Systems & Control Letters, 61(12):1203–1208, 2012. ISSN 0167-6911.
  5. The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Control and Optimization, 38(2):447–469, 2000. doi: 10.1137/S0363012997331639.
  6. Communication lower bounds for statistical estimation problems via a distributed data processing inequality. In Proceedings of the 48th Annual ACM Symposium on Theory of Computing, pages 1011–1020, 2016.
  7. Communication-efficient policy gradient methods for distributed reinforcement learning. IEEE Transactions on Control of Network Systems, 9(2):917–929, 2021a.
  8. Finite-sample analysis of contractive stochastic approximation using smooth convex envelopes. In Proceedings of the 34th Annual Conference on Neural Information Processing Systems, volume 33, pages 8223–8234, 2020.
  9. A Lyapunov theory for finite-sample guarantees of asynchronous Q-learning and TD-learning variants. arXiv preprint arXiv:2102.01567, 2021b.
  10. Multi-agent off-policy TDC with near-optimal sample and communication complexity. In Proceedings of the 55th Asilomar Conference on Signals, Systems, and Computers, pages 504–508, 2021c.
  11. Sample and communication-efficient decentralized actor-critic algorithms with finite-time analysis. In Proceedings of the 39th International Conference on Machine Learning, pages 3794–3834. PMLR, 2022.
  12. Finite-time analysis of distributed TD(0) with linear function approximation on multi-agent reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning, pages 1626–1635. PMLR, 2019.
  13. Finite-time performance of distributed temporal-difference learning with linear function approximation. SIAM Journal on Mathematics of Data Science, 3(1):298–320, 2021.
  14. Optimality guarantees for distributed statistical estimation. arXiv preprint arXiv:1405.0782, 2014.
  15. IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In International Conference on Machine Learning, pages 1406–1415, 2018.
  16. E. Even-Dar and Y. Mansour. Learning rates for Q-learning. Journal of Machine Learning Research, 5, 2004. ISSN 1532-4435.
  17. D. A. Freedman. On tail probabilities for martingales. The Annals of Probability, 3(1):100–118, 1975.
  18. Local SGD with periodic averaging: Tighter analysis and adaptive synchronization. In Proceedings of the 33rd Annual Conference on Neural Information Processing Systems, volume 32, 2019.
  19. H. v. Hasselt. Double Q-learning. In Proceedings of the 23rd International Conference on Neural Information Processing Systems, page 2613–2621. Curran Associates Inc., 2010.
  20. Convergence of stochastic iterative dynamic programming algorithms. In Proceedings of the 7th Annual Conference on Neural Information Processing Systems, volume 6, 1993.
  21. Is Q-learning provably efficient? In Advances in Neural Information Processing Systems, pages 4863–4873, 2018.
  22. Federated reinforcement learning with environment heterogeneity. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, pages 18–37. PMLR, 2022.
  23. S. M. Kakade. A natural policy gradient. Proceedings of the 15th Annual Conference on Neural Information Processing Systems, 14, 2001.
  24. M. Kearns and S. Singh. Finite-sample convergence rates for q-learning and indirect algorithms. In Proceedings of the 12th Annual Conference on Neural Information Processing Systems, 1998.
  25. Tighter Theory for Local SGD on Identical and Heterogeneous Data. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), pages 4519–4529. PMLR, 2020.
  26. Federated reinforcement learning: Linear speedup under markovian sampling. In Proceedings of the 39th International Conference on Machine Learning, pages 10997–11057. PMLR, 2022.
  27. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013.
  28. Asynchronous federated reinforcement learning with policy gradient updates: Algorithm design and convergence analysis. arXiv preprint arXiv:2404.08003, 2024.
  29. Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning. In Advances in Neural Information Processing Systems, volume 34, 2021a.
  30. Sample complexity of asynchronous Q-learning: Sharper analysis and variance reduction. IEEE Transactions on Information Theory, 68(1):448–473, 2021b.
  31. Is Q-learning minimax optimal? a tight sample complexity analysis. Operations Research, 72(1):222–236, 2024.
  32. Federated reinforcement learning for training control policies on multiple iot devices. Sensors, 20(5), 2020. ISSN 1424-8220. doi: 10.3390/s20051359.
  33. R. Liu and A. Olshevsky. Distributed TD(0) with almost no communication. IEEE Control Systems Letters, 7:2892–2897, 2023.
  34. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1273–1282. PMLR, 2017.
  35. UCB momentum Q-learning: Correcting the bias without forgetting. In International Conference on Machine Learning, pages 7609–7618. PMLR, 2021.
  36. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning, pages 1928–1937. PMLR, 2016a.
  37. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, volume 48, pages 1928–1937. PMLR, 2016b.
  38. M. Puterman. Markov decision processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 2014.
  39. G. Qu and A. Wierman. Finite-time analysis of asynchronous stochastic approximation and Q-learning. In Proceedings of the 33rd Conference on Learning Theory, pages 3185–3205. PMLR, 2020.
  40. S. Salgia and Q. Zhao. Distributed linear bandits under communication constraints. In Proceedings of the 40th International Conference on Machine Learning, ICML, pages 29845–29875. PMLR, 2023.
  41. Towards understanding asynchronous advantage actor-critic: Convergence and linear speedup. IEEE Transactions on Signal Processing, 71:2579–2594, 2023.
  42. C. Shi and C. Shen. Federated Multi-Armed Bandits. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, pages 9603–9611, 2021.
  43. Pessimistic Q-learning for offline reinforcement learning: Towards optimal sample complexity. In International Conference on Machine Learning, volume 162, pages 19967–20025. PMLR, 2022.
  44. Near-optimal time and sample complexities for solving Markov decision processes with a generative model. In Advances in Neural Information Processing Systems, pages 5186–5196, 2018.
  45. Mastering the game of go with deep neural networks and tree search. Nature, 529:484–489, 2016.
  46. Finite-time analysis of decentralized temporal-difference learning with linear function approximation. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, pages 4485–4495. PMLR, 2020.
  47. R. Sutton and A. Barton. Reinforcement learning: An introduction. MIT Press, 2018.
  48. C. Szepesvári. The asymptotic convergence-rate of Q-learning. Proceedings of the 11th Annual Conference on Neural Information Processing Systems, 10, 1997.
  49. One-shot averaging for distributed TD (λ𝜆\lambdaitalic_λ) under Markov sampling. IEEE Control Systems Letters, 2024.
  50. J. N. Tsitsiklis. Asynchronous stochastic approximation and Q-learning. Machine learning, 16:185–202, 1994.
  51. Communication complexity of convex optimization. Journal of Complexity, 3(3):231–243, 1987.
  52. R. Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  53. H.-T. Wai. On the convergence of consensus algorithms with markovian noise and gradient bias. In Proceedings of 59th IEEE Conference on Decision and Control, pages 4897–4902. IEEE, 2020.
  54. M. J. Wainwright. Stochastic approximation with cone-contractive operators: Sharp ℓ∞subscriptℓ\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-bounds for Q-learning. arXiv preprint arXiv:1905.06265, 2019a.
  55. M. J. Wainwright. Variance-reduced Q-learning is minimax optimal. arXiv preprint arXiv:1906.04697, 2019b.
  56. Decentralized TD tracking with linear function approximation and its finite-time analysis. Proceedings of the 34th Annual Conference on Neural Information Processing Systems, 33:13762–13772, 2020.
  57. Federated temporal difference learning with linear function approximation under environmental heterogeneity. arXiv preprint arXiv:2302.02212, 2023.
  58. C. J. Watkins and P. Dayan. Q-learning. Machine learning, 8:279–292, 1992.
  59. The blessing of heterogeneity in federated q-learning: Linear speedup and beyond. In Proceedings of the 40th International Conference on Machine Learning, page 37157–37216, 2023.
  60. Federated offline reinforcement learning: Collaborative single-policy coverage suffices. In Forty-first International Conference on Machine Learning, 2024.
  61. Graph oracle models, lower bounds, and gaps for parallel stochastic optimization. In Proceedings of the 32nd Annual Conference on Neural Information Processing Systems, volume 31, 2018.
  62. The min-max complexity of distributed stochastic convex optimization with intermittent communication. In Proceedings of the 34th Conference on Learning Theory, COLT, pages 4386–4437. PMLR, 2021.
  63. Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning. IEEE Transactions on Information Theory, 2024.
  64. The efficacy of pessimism in asynchronous Q-learning. IEEE Transactions on Information Theory, 69(11):7185–7219, 2023.
  65. Federated natural policy gradient and actor critic methods for multi-task reinforcement learning. arXiv preprint arXiv:2311.00201, 2023.
  66. A survey of autonomous driving: Common practices and emerging technologies. IEEE access, 8:58443–58469, 2020.
  67. A decentralized policy gradient approach to multi-task reinforcement learning. In Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence, UAI, pages 1002–1012. PMLR, 2021a.
  68. Finite-time analysis of decentralized stochastic approximation with applications in multi-agent and multi-task learning. In Proceedings of the 60th IEEE Conference on Decision and Control, pages 2641–2646. IEEE, 2021b.
  69. Federated Q-learning: Linear regret speedup with low communication cost. In The Twelfth International Conference on Learning Representations, 2024.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 2 tweets with 28 likes about this paper.