Federated Control in Markov Decision Processes (2405.04026v1)
Abstract: We study problems of federated control in Markov Decision Processes. To solve an MDP with large state space, multiple learning agents are introduced to collaboratively learn its optimal policy without communication of locally collected experience. In our settings, these agents have limited capabilities, which means they are restricted within different regions of the overall state space during the training process. In face of the difference among restricted regions, we firstly introduce concepts of leakage probabilities to understand how such heterogeneity affects the learning process, and then propose a novel communication protocol that we call Federated-Q protocol (FedQ), which periodically aggregates agents' knowledge of their restricted regions and accordingly modifies their learning problems for further training. In terms of theoretical analysis, we justify the correctness of FedQ as a communication protocol, then give a general result on sample complexity of derived algorithms FedQ-X with the RL oracle , and finally conduct a thorough study on the sample complexity of FedQ-SynQ. Specifically, FedQ-X has been shown to enjoy linear speedup in terms of sample complexity when workload is uniformly distributed among agents. Moreover, we carry out experiments in various environments to justify the efficiency of our methods.
- Model-based reinforcement learning with a generative model is minimax optimal. In Conference on Learning Theory, pages 67–83. PMLR, 2020.
- 5g network coverage planning and analysis of the deployment challenges. Sensors, 21(19):6608, 2021.
- A model-based reinforcement learning with adversarial training for online recommendation. Advances in Neural Information Processing Systems, 32, 2019.
- Model-free deep reinforcement learning for urban autonomous driving. In 2019 IEEE intelligent transportation systems conference (ITSC), pages 2765–2771. IEEE, 2019.
- Finite-sample analysis of contractive stochastic approximation using smooth convex envelopes. Advances in Neural Information Processing Systems, 33:8223–8234, 2020.
- Exact decomposition approaches for markov decision processes: A survey. Advances in Operations Research, 2010, 2010.
- Extending cell tower coverage through drones. In Proceedings of the 18th International Workshop on Mobile Computing Systems and Applications, pages 7–12, 2017.
- Driverless car: Autonomous driving using deep reinforcement learning in urban environment. In 2018 15th international conference on ubiquitous robots (ur), pages 896–901. IEEE, 2018.
- David A Freedman. On tail probabilities for martingales. The Annals of Probability, 3(1):100–118, 1975.
- Optimal control in markov decision processes via distributed optimization. In 2015 54th IEEE Conference on Decision and Control (CDC), pages 7462–7469. IEEE, 2015.
- Low latency 5g distributed wireless network architecture: A techno-economic comparison. Inventions, 6(1):11, 2021.
- A comprehensive survey of ran architectures toward 5g mobile communication system. Ieee Access, 7:70371–70421, 2019.
- Federated reinforcement learning with environment heterogeneity. In International Conference on Artificial Intelligence and Statistics, pages 18–37. PMLR, 2022.
- Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 2021.
- Is q-learning minimax optimal? a tight sample complexity analysis. Operations Research, 2023a.
- Breaking the sample size barrier in model-based reinforcement learning with a generative model. Operations Research, 2023b.
- Lifelong federated reinforcement learning: A learning architecture for navigation in cloud robotic systems. arXiv preprint arXiv:1901.06455, 2019.
- Decentralized cooperative stochastic bandits. Advances in Neural Information Processing Systems, 32, 2019.
- Solving very large weakly coupled markov decision processes. In AAAI/IAAI, pages 165–172, 1998.
- Federated reinforcement learning for fast personalization. In 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), pages 123–127. IEEE, 2019.
- Introduction to reinforcement learning, volume 135. MIT press Cambridge, 1998.
- Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Number 47 in Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018. ISBN 978-1-108-41519-4.
- Martin J Wainwright. Variance-reduced q-learning is minimax optimal. arXiv preprint arXiv:1906.04697, 2019.
- Federated deep reinforcement learning for internet of things with decentralized cooperative edge caching. IEEE Internet of Things Journal, 7(10):9441–9455, 2020.
- Distributed bandit learning: Near-optimal regret with efficient communication. arXiv preprint arXiv:1904.06309, 2019.
- The blessing of heterogeneity in federated q-learning: Linear speedup and beyond. arXiv preprint arXiv:2305.10697, 2023.
- Reinforcement learning to optimize long-term user engagement in recommender systems. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2810–2818, 2019.