Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Relative Sparsity for Medical Decision Problems (2211.16566v3)

Published 29 Nov 2022 in stat.ME and cs.LG

Abstract: Existing statistical methods can estimate a policy, or a mapping from covariates to decisions, which can then instruct decision makers (e.g., whether to administer hypotension treatment based on covariates blood pressure and heart rate). There is great interest in using such data-driven policies in healthcare. However, it is often important to explain to the healthcare provider, and to the patient, how a new policy differs from the current standard of care. This end is facilitated if one can pinpoint the aspects of the policy (i.e., the parameters for blood pressure and heart rate) that change when moving from the standard of care to the new, suggested policy. To this end, we adapt ideas from Trust Region Policy Optimization (TRPO). In our work, however, unlike in TRPO, the difference between the suggested policy and standard of care is required to be sparse, aiding with interpretability. This yields ``relative sparsity," where, as a function of a tuning parameter, $\lambda$, we can approximately control the number of parameters in our suggested policy that differ from their counterparts in the standard of care (e.g., heart rate only). We propose a criterion for selecting $\lambda$, perform simulations, and illustrate our method with a real, observational healthcare dataset, deriving a policy that is easy to explain in the context of the current standard of care. Our work promotes the adoption of data-driven decision aids, which have great potential to improve health outcomes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (80)
  1. Constrained policy optimization. In International Conference on Machine Learning, pages 22–31. PMLR, 2017.
  2. Richard Bellman. A markovian decision process. Journal of mathematics and mechanics, pages 679–684, 1957.
  3. Best subset selection via a modern optimization lens. The annals of statistics, 44(2):813–852, 2016.
  4. Bibhas Chakraborty and Erica E. M. Moodie. Statistical Reinforcement Learning, pages 31–52. Springer New York, New York, NY, 2013. ISBN 978-1-4614-7428-9. doi: 10.1007/978-1-4614-7428-9˙3. URL https://doi.org/10.1007/978-1-4614-7428-9_3.
  5. Multi-stage optimal dynamic treatment regimes for survival outcomes with dependent censoring. arXiv preprint arXiv:2012.03294, 2020.
  6. Vasopressors in sepsis. Surgical infections, 19(2):202–207, 2018.
  7. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): the tripod statement. Journal of British Surgery, 102(3):148–158, 2015.
  8. Bench-to-bedside review: β𝛽\betaitalic_β-adrenergic modulation in sepsis. Critical care, 13:1–8, 2009.
  9. Physiology, mean arterial pressure. 2021.
  10. Narrative review of controversies involving vasopressin use in septic shock and practical considerations. Annals of Pharmacotherapy, 54(7):706–714, 2020.
  11. Cooperative learning for multiview analysis. Proceedings of the National Academy of Sciences, 119(38):e2202113119, 2022.
  12. Techniques for interpretable machine learning. Communications of the ACM, 63(1):68–77, 2019.
  13. Constructing dynamic treatment regimes over indefinite time horizons. Biometrika, 105(4):963–977, 2018.
  14. Nonparametric inverse probability weighted estimators based on the highly adaptive lasso. arXiv preprint arXiv:2005.11303, 2020.
  15. Robust q-learning. Journal of the American Statistical Association, 116(533):368–381, 2021.
  16. Value-aware loss function for model learning in reinforcement learning. 2016.
  17. Regularization paths for generalized linear models via coordinate descent, 2010. URL https://www.jstatsoft.org/v33/i01/.
  18. Popcorn: Partially observed prediction constrained reinforcement learning. arXiv preprint arXiv:2001.04032, 2020.
  19. Role of adrenergic receptors in shock. Frontiers in Physiology, 14:16, 2023.
  20. Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. circulation, 101(23):e215–e220, 2000.
  21. Interpretable off-policy evaluation in reinforcement learning by highlighting influential transitions. In International Conference on Machine Learning, pages 3658–3667. PMLR, 2020.
  22. Vasopressors: Do they have any role in hemorrhagic shock? Journal of anaesthesiology, clinical pharmacology, 33(1):3, 2017.
  23. Estimation of the effect of interventions that modify the received treatment. Statistics in medicine, 32(30):5260–5277, 2013.
  24. Sparse feature selection makes batch reinforcement learning more sample efficient. arXiv preprint arXiv:2011.04019, 2020.
  25. Frank E Harrell et al. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis, volume 608. Springer, 2001.
  26. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
  27. Batch size-invariance for policy optimization. arXiv preprint arXiv:2110.00641, 2021.
  28. Regularized least squares temporal difference learning with nested l 2 and l 1 penalization. In European Workshop on Reinforcement Learning, pages 102–114. Springer, 2011.
  29. A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association, 47(260):663–685, 1952.
  30. Personalized dynamic treatment regimes in continuous time: a bayesian approach for optimizing clinical decisions with timing. Bayesian Analysis, 17(3):849–878, 2022.
  31. Mimic-iii clinical database. Physio Net, 10:C2XW26, 2016a.
  32. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016b.
  33. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016c.
  34. Valid post-selection inference in robust q-learning. arXiv preprint arXiv:2208.03233, 2022.
  35. Edward H Kennedy. Nonparametric causal effects based on incremental propensity score interventions. Journal of the American Statistical Association, 114(526):645–656, 2019.
  36. Edward H Kennedy. Semiparametric doubly robust targeted double machine learning: a review. arXiv preprint arXiv:2203.06469, 2022.
  37. Regularization and feature selection in least-squares temporal difference learning. In Proceedings of the 26th annual international conference on machine learning, pages 521–528, 2009.
  38. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nature medicine, 24(11):1716–1720, 2018.
  39. Interrogating a clinical database to study treatment of hypotension in the critically ill. BMJ open, 2(3):e000916, 2012.
  40. Zachary C Lipton. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, 16(3):31–57, 2018.
  41. Regularized off-policy td-learning. Advances in Neural Information Processing Systems, 25:836–844, 2012.
  42. Regularization matters in policy optimization–an empirical study on continuous control. arXiv preprint arXiv:1910.09191, 2019.
  43. Estimating dynamic treatment regimes in mobile health using v-learning. Journal of the American Statistical Association, 2019.
  44. Jesús Marín. Age-related changes in vascular responses: a review. Mechanisms of ageing and development, 79(2-3):71–114, 1995.
  45. Tim Miller. Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence, 267:1–38, 2019.
  46. Iván Díaz Muñoz and Mark van der Laan. Population intervention causal effects based on stochastic interventions. Biometrics, 68(2):541–549, 2012.
  47. Susan A Murphy. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(2):331–355, 2003.
  48. Marginal mean models for dynamic regimes. Journal of the American Statistical Association, 96(456):1410–1423, 2001.
  49. Balas Kausik Natarajan. Sparse approximate solutions to linear systems. SIAM journal on computing, 24(2):227–234, 1995.
  50. Pegasus: A policy search method for large mdps and pomdps. arXiv preprint arXiv:1301.3878, 2013.
  51. Algorithms for inverse reinforcement learning. In Icml, volume 1, page 2, 2000.
  52. Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning, pages 625–632, 2005.
  53. Doina Precup. Eligibility traces for off-policy policy evaluation. Computer Science Department Faculty Publication Series, page 80, 2000.
  54. Brad Scott Price. Fusion Penalties in Statistical Learning. PhD thesis, University of Minnesota, 2014.
  55. Explainable reinforcement learning: A survey. In International cross-domain conference for machine learning and knowledge extraction, pages 77–95. Springer, 2020.
  56. Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  57. Sparse reinforcement learning via convex optimization. In International Conference on Machine Learning, pages 424–432. PMLR, 2014.
  58. Continuous state-space models for optimal sepsis treatment-a deep reinforcement learning approach. arXiv preprint arXiv:1705.08422, 2017.
  59. Bayesian inverse reinforcement learning. In IJCAI, volume 7, pages 2586–2591, 2007.
  60. Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89(427):846–866, 1994.
  61. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019.
  62. Accuracy of medical billing data against the electronic health record in the measurement of colorectal cancer screening rates. BMJ open quality, 9(1):e000856, 2020.
  63. Vasopressor therapy in the intensive care unit. In Seminars in Respiratory and Critical Care Medicine, volume 42, pages 059–077. Thieme Medical Publishers, Inc., 2021.
  64. Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
  65. Q-and a-learning methods for estimating optimal dynamic treatment regimes. Statistical science: a review journal of the Institute of Mathematical Statistics, 29(4):640, 2014.
  66. Norepinephrine. In StatPearls [Internet]. StatPearls Publishing, 2021.
  67. Penalized q-learning for dynamic treatment regimens. Statistica Sinica, 25(3):901, 2015.
  68. The nomenclature, definition and distinction of types of shock. Deutsches Ärzteblatt International, 115(45):757, 2018.
  69. Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology (Cambridge, Mass.), 21(1):128, 2010.
  70. Reinforcement learning: An introduction. 2018.
  71. Philip S Thomas. Safe reinforcement learning. PhD thesis, 2015.
  72. Robert Tibshirani. Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3):273–282, 2011.
  73. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(1):91–108, 2005.
  74. A calibration hierarchy for risk models was defined: from utopia to empirical data. Journal of clinical epidemiology, 74:167–176, 2016.
  75. Aad W van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
  76. Q-learning. Machine learning, 8:279–292, 1992.
  77. A regularized approach to sparse optimal policy in reinforcement learning. Advances in Neural Information Processing Systems, 32:5940–5950, 2019.
  78. Policy optimization with sparse global contrastive explanations. arXiv preprint arXiv:2207.06269, 2022.
  79. Hypotension in icu patients receiving vasopressor therapy. Scientific reports, 7(1):1–10, 2017.
  80. Identification, estimation and approximation of risk under interventions that depend on the natural value of treatment using observational data. Epidemiologic methods, 3(1):1–19, 2014.
Citations (2)

Summary

We haven't generated a summary for this paper yet.