Federated Offline Policy Optimization with Dual Regularization (2405.17474v2)
Abstract: Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things. However, existing FRL approaches often entail repeated interactions with the environment during local updating, which can be prohibitively expensive or even infeasible in many real-world domains. To overcome this challenge, this paper proposes a novel offline federated policy optimization algorithm, named $\texttt{DRPO}$, which enables distributed agents to collaboratively learn a decision policy only from private and static data without further environmental interactions. $\texttt{DRPO}$ leverages dual regularization, incorporating both the local behavioral policy and the global aggregated policy, to judiciously cope with the intrinsic two-tier distributional shifts in offline FRL. Theoretical analysis characterizes the impact of the dual regularization on performance, demonstrating that by achieving the right balance thereof, $\texttt{DRPO}$ can effectively counteract distributional shifts and ensure strict policy improvement in each federative learning round. Extensive experiments validate the significant performance gains of $\texttt{DRPO}$ over baseline methods.
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. AISTATS. PMLR, 2017, pp. 1273–1282.
- S. Li, Y. Yan, J. Ren, Y. Zhou, and Y. Zhang, “A sample-efficient actor-critic algorithm for recommendation diversification,” Chin. J. Electron., vol. 29, no. 1, pp. 89–96, 2020.
- D. Xu, T. Li, Y. Li, X. Su, S. Tarkoma, T. Jiang, J. Crowcroft, and P. Hui, “Edge intelligence: Empowering intelligence to the edge of network,” Proc. IEEE, vol. 109, no. 11, pp. 1778–1837, 2021.
- B. Liu, L. Wang, and M. Liu, “Lifelong federated reinforcement learning: a learning architecture for navigation in cloud robotic systems,” IEEE Rob. Autom. Lett., vol. 4, no. 4, pp. 4555–4562, 2019.
- C. Nadiger, A. Kumar, and S. Abdelhak, “Federated reinforcement learning for fast personalization,” in Proc. AIKE. IEEE, 2019, pp. 123–127.
- H. H. Zhuo, W. Feng, Y. Lin, Q. Xu, and Q. Yang, “Federated deep reinforcement learning,” 2020.
- X. Fan, Y. Ma, Z. Dai, W. Jing, C. Tan, and B. K. H. Low, “Fault-tolerant federated reinforcement learning with theoretical guarantee,” in Proc. NeurIPS, vol. 34. Curran Associates, Inc., 2021, pp. 1007–1021.
- S. Khodadadian, P. Sharma, G. Joshi, and S. T. Maguluri, “Federated reinforcement learning: Linear speedup under markovian sampling,” in Proc. ICML, vol. 162. PMLR, 2022, pp. 10 997–11 057.
- S. Yu, X. Chen, Z. Zhou, X. Gong, and D. Wu, “When deep reinforcement learning meets federated learning: Intelligent multitimescale resource management for multiaccess edge computing in 5g ultradense network,” IEEE Internet Things J., vol. 8, no. 4, pp. 2238–2251, 2020.
- H.-K. Lim, J.-B. Kim, J.-S. Heo, and Y.-H. Han, “Federated reinforcement learning for training control policies on multiple iot devices,” Sensors, vol. 20, no. 5, p. 1359, 2020.
- S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643, 2020.
- P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings et al., “Advances and open problems in federated learning,” Found. Trends Mach. Learn., vol. 14, no. 1–2, pp. 1–210, 2021.
- Z. Wang, K. Liu, J. Hu, J. Ren, H. Guo, and W. Yuan, “Attrleaks on the edge: Exploiting information leakage from privacy-preserving co-inference,” Chin. J. Electron., vol. 32, no. 1, pp. 1–12, 2023.
- H. Hasselt, “Double q-learning,” Advances in neural information processing systems, vol. 23, pp. 2613–2621, 2010.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- X. Liang, Y. Liu, T. Chen, M. Liu, and Q. Yang, “Federated transfer reinforcement learning for autonomous driving,” arXiv preprint arXiv:1910.06001, 2019.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
- H. Cha, J. Park, H. Kim, S.-L. Kim, and M. Bennis, “Federated reinforcement distillation with proxy experience memory,” arXiv preprint arXiv:1907.06536, 2019.
- A. Anwar and A. Raychowdhury, “Multi-task federated reinforcement learning with adversaries,” arXiv preprint arXiv:2103.06473, 2021.
- D. Zhou, Y. Zhang, A. Sonabend-W, Z. Wang, J. Lu, and T. Cai, “Federated offline reinforcement learning,” arXiv preprint arXiv:2206.05581, 2022.
- H. Shen, S. Lu, X. Cui, and T. Chen, “Distributed offline policy optimization over batch data,” in Proc. AISTATS. PMLR, 2023, pp. 4443–4472.
- D. Rengarajan, N. Ragothaman, D. Kalathil, and S. Shakkottai, “Federated ensemble-directed offline reinforcement learning,” arXiv preprint arXiv:2305.03097, 2023.
- A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative q-learning for offline reinforcement learning,” Proc. NeurIPS, pp. 1179–1191, 2020.
- I. Kostrikov, R. Fergus, J. Tompson, and O. Nachum, “Offline reinforcement learning with fisher divergence critic regularization,” in Proc. ICML. PMLR, 2021, pp. 5774–5783.
- T. Yu, A. Kumar, R. Rafailov, A. Rajeswaran, S. Levine, and C. Finn, “Combo: Conservative offline model-based policy optimization,” Proc. NeurIPS, 2021.
- J. Fu, A. Kumar, O. Nachum, G. Tucker, and S. Levine, “D4rl: Datasets for deep data-driven reinforcement learning.” arXiv preprint arXiv:2004.07219, 2020.
- T. Yu, G. Thomas, L. Yu, S. Ermon, J. Y. Zou, S. Levine, C. Finn, and T. Ma, “Mopo: Model-based offline policy optimization,” in Proc. NeurIPS. Curran Associates, Inc., 2020, pp. 14 129–14 142.
- S. Lin, J. Wan, T. Xu, Y. Liang, and J. Zhang, “Model-based offline meta-reinforcement learning with regularization,” in Proc. ICLR, 2022.
- A. Agarwal, N. Jiang, S. M. Kakade, and W. Sun, “Reinforcement learning: Theory and algorithms,” Tech. Rep., 2019.
- E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in Proc. IROS. IEEE, 2012, pp. 5026–5033.