Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Federated Offline Reinforcement Learning (2206.05581v3)

Published 11 Jun 2022 in stat.ML, cs.LG, and stat.ME

Abstract: Evidence-based or data-driven dynamic treatment regimes are essential for personalized medicine, which can benefit from offline reinforcement learning (RL). Although massive healthcare data are available across medical institutions, they are prohibited from sharing due to privacy constraints. Besides, heterogeneity exists in different sites. As a result, federated offline RL algorithms are necessary and promising to deal with the problems. In this paper, we propose a multi-site Markov decision process model that allows for both homogeneous and heterogeneous effects across sites. The proposed model makes the analysis of the site-level features possible. We design the first federated policy optimization algorithm for offline RL with sample complexity. The proposed algorithm is communication-efficient, which requires only a single round of communication interaction by exchanging summary statistics. We give a theoretical guarantee for the proposed algorithm, where the suboptimality for the learned policies is comparable to the rate as if data is not distributed. Extensive simulations demonstrate the effectiveness of the proposed algorithm. The method is applied to a sepsis dataset in multiple sites to illustrate its use in clinical settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. On the theory of policy gradient methods: Optimality, approximation, and distribution shift. The Journal of Machine Learning Research 22(1), 4431–4506.
  2. An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning, pp.  104–114. PMLR.
  3. The smartphone as a medical device: Assessing enablers, benefits and challenges. In 2013 IEEE International Workshop of Internet-of-Things Networking and Control (IoT-NC), pp.  48–52. IEEE.
  4. Fitted q-iteration in continuous action-space mdps. In J. Platt, D. Koller, Y. Singer, and S. Roweis (Eds.), Advances in Neural Information Processing Systems, Volume 20. Curran Associates, Inc.
  5. Distributed testing and estimation under sparse high dimensional models. Annals of statistics 46(3), 1352.
  6. Deepmood: modeling mobile phone typing dynamics for mood detection. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.  747–755.
  7. Chakraborty, B. (2013). Statistical methods for dynamic treatment regimes. Springer.
  8. Chakraborty, B. and S. A. Murphy (2014). Dynamic treatment regimes. Annual review of statistics and its application 1, 447–464.
  9. Communication-efficient policy gradient methods for distributed reinforcement learning. IEEE Transactions on Control of Network Systems 9(2), 917–929.
  10. Byzantine-robust online and offline distributed reinforcement learning. In International Conference on Artificial Intelligence and Statistics, pp.  3230–3269. PMLR.
  11. Heterogeneity-aware and communication-efficient distributed statistical inference. Biometrika 109(1), 67–83.
  12. Minimax-optimal off-policy evaluation with linear function approximation. In International Conference on Machine Learning, pp.  2701–2709. PMLR.
  13. Privacy aware learning. Journal of the ACM (JACM) 61(6), 1–57.
  14. Gadmm: Fast and communication efficient framework for distributed machine learning. Journal of Machine Learning Research 21(76), 1–39.
  15. Fault-tolerant federated reinforcement learning with theoretical guarantee. Advances in Neural Information Processing Systems 34, 1007–1021.
  16. The effectiveness of mobile-health technologies to improve health care service delivery processes: a systematic review and meta-analysis. PLoS Med 10(1), e1001363.
  17. Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning, pp.  2052–2062. PMLR.
  18. Guidelines for reinforcement learning in healthcare. Nature medicine 25(1), 16–18.
  19. Evaluating reinforcement learning algorithms in observational health settings. arXiv preprint arXiv:1805.12298.
  20. Rl unplugged: Benchmarks for offline reinforcement learning. arXiv preprint arXiv:2006.13888.
  21. Efficient and privacy-enhanced federated learning for industrial artificial intelligence. IEEE Transactions on Industrial Informatics 16(10), 6532–6542.
  22. Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604.
  23. Stochastic approximation and recursive algorithm and applications. Application of Mathematics 35.
  24. The elements of statistical learning: data mining, inference, and prediction, Volume 2. Springer.
  25. Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data. NPJ digital medicine 4(1), 1–11.
  26. Byzantine-robust federated linear bandits. In 2022 IEEE 61st Conference on Decision and Control (CDC), pp.  5206–5213. IEEE.
  27. Doubly robust off-policy value evaluation for reinforcement learning. In International Conference on Machine Learning, pp.  652–661. PMLR.
  28. Is pessimism provably efficient for offline RL? In International Conference on Machine Learning, pp.  5084–5096. PMLR.
  29. MIMIC-IV (version 0.4). PhysioNet.
  30. MIMIC-III, a freely accessible critical care database. Scientific data 3(1), 1–9.
  31. Communication-efficient distributed statistical inference. Journal of the American Statistical Association 114(526), 668–681.
  32. Morel: Model-based offline reinforcement learning. Advances in Neural Information Processing Systems 33, 21810–21823.
  33. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492.
  34. Batch reinforcement learning. In Reinforcement learning, pp.  45–73. Springer.
  35. Dynamic treatment regimes: practical design considerations. Clinical trials 1(1), 9–20.
  36. Lee, S. M. and W. S. An (2016). New clinical criteria for septic shock: serum lactate level as new emerging vital sign. Journal of thoracic disease 8(7), 1388.
  37. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643.
  38. Privacy-preserving federated brain tumour segmentation. In International Workshop on Machine Learning in Medical Imaging, pp.  133–141. Springer.
  39. Federated reinforcement learning for training control policies on multiple iot devices. Sensors 20(5), 1359.
  40. Neural trust region/proximal policy optimization attains globally optimal policy. Advances in Neural Information Processing Systems 32, 10565–10576.
  41. Estimating dynamic treatment regimes in mobile health using v-learning. Journal of the American Statistical Association 115(530), 692–706.
  42. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pp.  1273–1282. PMLR.
  43. Predictive modeling of the hospital readmission risk from patients’ claims data using machine learning: a case study on copd. Scientific reports 9(1), 1–10.
  44. Murphy, S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65(2), 331–355.
  45. Marginal mean models for dynamic regimes. Journal of the American Statistical Association 96(456), 1410–1423.
  46. Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections. Advances in Neural Information Processing Systems 32.
  47. Federated reinforcement learning for fast personalization. In 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), pp.  123–127. IEEE.
  48. Combining kernel and model based learning for hiv therapy selection. AMIA Summits on Translational Science Proceedings 2017, 239.
  49. Puterman, M. L. (2014). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
  50. Continuous state-space models for optimal sepsis treatment: a deep reinforcement learning approach. In Machine Learning for Healthcare Conference, pp.  147–163. PMLR.
  51. Surviving sepsis campaign: international guidelines for management of sepsis and septic shock: 2016. Intensive care medicine 43(3), 304–377.
  52. Robins, J. M. (2004). Optimal structural nested models for optimal sequential decisions. In Proceedings of the second seattle Symposium in Biostatistics, pp.  189–326. Springer.
  53. Declining intensive care unit mortality of covid-19: a multi-center study. Journal of clinical medicine research 13(3), 184.
  54. Fetchsgd: Communication-efficient federated learning with sketching. In International Conference on Machine Learning, pp.  8253–8265. PMLR.
  55. The global burden of sepsis: barriers and potential solutions. Critical Care 22(1), 1–11.
  56. Lactate level versus lactate clearance for predicting mortality in patients with septic shock defined by sepsis-3. Critical care medicine 46(6), e489–e495.
  57. Approximate modified policy iteration and its application to the game of tetris. Journal of Machine Learning Research 16, 1629–1676.
  58. Schultz, M. H. (1969). L∞{}^{\infty}start_FLOATSUPERSCRIPT ∞ end_FLOATSUPERSCRIPT-multivariate approximation theory. SIAM Journal on Numerical Analysis 6(2), 161–183.
  59. Expert-supervised reinforcement learning for offline policy learning and evaluation. In Advances in Neural Information Processing Systems, Volume 33, pp.  18967–18977.
  60. Semi-supervised off-policy reinforcement learning and value estimation for dynamic treatment regimes. Journal of Machine Learning Research 24(323), 1–86.
  61. Sutton, R. S. and A. G. Barto (2018). Reinforcement learning: An introduction. MIT press.
  62. Data-efficient off-policy policy evaluation for reinforcement learning. In International Conference on Machine Learning, pp.  2139–2148. PMLR.
  63. Distributed inference for linear support vector machine. Journal of Machine Learning Research 20.
  64. Bellman-consistent pessimism for offline reinforcement learning. Advances in neural information processing systems 34, 6683–6694.
  65. Batch value-function approximation with only realizability. In International Conference on Machine Learning, pp.  11404–11413. PMLR.
  66. Fedkl: Tackling data heterogeneity in federated reinforcement learning by penalizing kl divergence. IEEE Journal on Selected Areas in Communications 41(4), 1227–1242.
  67. Fedmood: Federated learning on mobile health data for mood detection. arXiv preprint arXiv:2102.09342.
  68. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10(2), 1–19.
  69. Offline reinforcement learning with realizability and single-policy concentrability. In Conference on Learning Theory, pp.  2730–2775. PMLR.
  70. Multi-agent reinforcement learning: A selective overview of theories and algorithms. arXiv preprint arXiv:1911.10635.
  71. Gendice: Generalized offline estimation of stationary values. arXiv preprint arXiv:2002.09072.
  72. Individualized fluid administration for critically ill patients with sepsis with an interpretable dynamic treatment regimen model. Scientific Reports 10(1), 1–9.
  73. Federated deep reinforcement learning. arXiv preprint arXiv:1901.08277.
Citations (11)

Summary

We haven't generated a summary for this paper yet.