Towards Fast Rates for Federated and Multi-Task Reinforcement Learning (2409.05291v1)
Abstract: We consider a setting involving $N$ agents, where each agent interacts with an environment modeled as a Markov Decision Process (MDP). The agents' MDPs differ in their reward functions, capturing heterogeneous objectives/tasks. The collective goal of the agents is to communicate intermittently via a central server to find a policy that maximizes the average of long-term cumulative rewards across environments. The limited existing work on this topic either only provide asymptotic rates, or generate biased policies, or fail to establish any benefits of collaboration. In response, we propose Fast-FedPG - a novel federated policy gradient algorithm with a carefully designed bias-correction mechanism. Under a gradient-domination condition, we prove that our algorithm guarantees (i) fast linear convergence with exact gradients, and (ii) sub-linear rates that enjoy a linear speedup w.r.t. the number of agents with noisy, truncated policy gradients. Notably, in each case, the convergence is to a globally optimal policy with no heterogeneity-induced bias. In the absence of gradient-domination, we establish convergence to a first-order stationary point at a rate that continues to benefit from collaboration.
- Federated reinforcement learning: Techniques, applications, and open challenges. arXiv:2108.11887, 2021.
- Federated reinforcement learning: Linear speedup under Markovian sampling. In Int. Conf. on Machine Learning, pages 10997–11057. PMLR, 2022.
- Federated TD learning over finite-rate erasure channels: Linear speedup under Markovian sampling. IEEE Control Systems Letters, 7:2461–2466, 2023.
- Distributed TD (0)0(0)( 0 ) with almost no communication. IEEE Control Systems Letters, 7:2892–2897, 2023.
- Improved communication efficiency in federated natural policy gradient via ADMM-based gradient updates. arXiv preprint arXiv:2310.19807, 2023.
- The blessing of heterogeneity in federated Q-learning: Linear speedup and beyond. In International Conference on Machine Learning, pages 37157–37216. PMLR, 2023.
- One-shot averaging for distributed TD (λ𝜆\lambdaitalic_λ) under markov sampling. IEEE Control Systems Letters, 2024.
- Federated reinforcement learning with environment heterogeneity. In International Conf. on Artificial Intelligence and Stat., pages 18–37. PMLR, 2022.
- Federated temporal difference learning with linear function approximation under environmental heterogeneity. arXiv:2302.02212, 2023.
- Finite-time analysis of on-policy heterogeneous federated reinforcement learning. arXiv preprint arXiv:2401.15273, 2024.
- Multi-task reinforcement learning with context-based representations. In International Conference on Machine Learning, pages 9767–9779. PMLR, 2021.
- Communication-efficient learning of deep networks from decentralized data. In AISTATS, pages 1273–1282. PMLR, 2017.
- FedKL: Tackling data heterogeneity in federated reinforcement learning by penalizing KL divergence. IEEE Journal on Selected Areas in Comm., 41(4):1227–1242, 2023.
- A decentralized policy gradient approach to multi-task reinforcement learning. In Uncertainty in Artificial Intelligence. PMLR, 2021.
- On the global convergence rates of softmax policy gradient methods. In Int. Conf. on Machine Learning, pages 6820–6829. PMLR, 2020.
- A general sample complexity analysis of vanilla policy gradient. In Int. Conf. on Artificial Intelligence and Statistics, pages 3332–3380. PMLR, 2022.
- Finite-time complexity of incremental policy gradient methods for solving multi-task reinforcement learning. In 6th Annual Learning for Dynamics & Control Conference, pages 1046–1057. PMLR, 2024.
- Momentum for the win: Collaborative federated reinforcement learning across heterogeneous environments. arXiv preprint arXiv:2405.19499, 2024.
- Martin L Puterman. Markov decision processes. Handbooks in Operations Research and Management Science, 2:331–434, 1990.
- On the theory of policy gradient methods: Optimality, approximation, and distribution shift. Journal of Machine Learning Research, 22(98):1–76, 2021.
- Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, 12, 1999.
- Linear convergence in federated learning: Tackling client heterogeneity and sparse gradients. Advances in Neural Information Processing Systems, 34:14606–14619, 2021.
- Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pages 5132–5143. PMLR, 2020.
- Global convergence of policy gradient methods for the linear quadratic regulator. In Int. Conf. on Machine Learning, pages 1467–1476. PMLR, 2018.
- Reinforcement learning: Theory and algorithms. CS Dept., UW Seattle, Seattle, WA, USA, Tech. Rep, 32, 2019.
- Lin Xiao. On the convergence rates of policy gradient methods. Journal of Machine Learning Research, 23(282):1–36, 2022.