Inference for relative sparsity (2306.14297v1)
Abstract: In healthcare, there is much interest in estimating policies, or mappings from covariates to treatment decisions. Recently, there is also interest in constraining these estimated policies to the standard of care, which generated the observed data. A relative sparsity penalty was proposed to derive policies that have sparse, explainable differences from the standard of care, facilitating justification of the new policy. However, the developers of this penalty only considered estimation, not inference. Here, we develop inference for the relative sparsity objective function, because characterizing uncertainty is crucial to applications in medicine. Further, in the relative sparsity work, the authors only considered the single-stage decision case; here, we consider the more general, multi-stage case. Inference is difficult, because the relative sparsity objective depends on the unpenalized value function, which is unstable and has infinite estimands in the binary action case. Further, one must deal with a non-differentiable penalty. To tackle these issues, we nest a weighted Trust Region Policy Optimization function within a relative sparsity objective, implement an adaptive relative sparsity penalty, and propose a sample-splitting framework for post-selection inference. We study the asymptotic behavior of our proposed approaches, perform extensive simulations, and analyze a real, electronic health record dataset.
- Constrained policy optimization. In International Conference on Machine Learning, pages 22–31. PMLR, 2017.
- Richard Bellman. A markovian decision process. Journal of mathematics and mechanics, pages 679–684, 1957.
- Statistical inference. Cengage Learning, 2002.
- Bibhas Chakraborty and Erica E. M. Moodie. Statistical Reinforcement Learning, pages 31–52. Springer New York, New York, NY, 2013. ISBN 978-1-4614-7428-9. doi: 10.1007/978-1-4614-7428-9˙3. URL https://doi.org/10.1007/978-1-4614-7428-9_3.
- David R Cox. A note on data-splitting for the evaluation of significance levels. Biometrika, 62(2):441–444, 1975.
- Using expectation-maximization for reinforcement learning. Neural Computation, 9(2):271–278, 1997.
- Narrative review of controversies involving vasopressin use in septic shock and practical considerations. Annals of Pharmacotherapy, 54(7):706–714, 2020.
- Techniques for interpretable machine learning. Communications of the ACM, 63(1):68–77, 2019.
- Benjamin Eltzner. Testing for uniqueness of estimators. arXiv preprint arXiv:2011.14762, 2020.
- Constructing dynamic treatment regimes over indefinite time horizons. Biometrika, 105(4):963–977, 2018.
- Value-aware loss function for model-based reinforcement learning. In Artificial Intelligence and Statistics, pages 1486–1494. PMLR, 2017.
- Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning, pages 2052–2062. PMLR, 2019.
- A comparison of models for predicting early hospital readmissions. Journal of biomedical informatics, 56:229–238, 2015.
- Popcorn: Partially observed prediction constrained reinforcement learning. arXiv preprint arXiv:2001.04032, 2020.
- A theory of regularized markov decision processes. In International Conference on Machine Learning, pages 2160–2169. PMLR, 2019.
- Charles J Geyer. On the asymptotics of constrained m-estimation. The Annals of statistics, pages 1993–2010, 1994.
- Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. circulation, 101(23):e215–e220, 2000.
- Interpretable off-policy evaluation in reinforcement learning by highlighting influential transitions. In International Conference on Machine Learning, pages 3658–3667. PMLR, 2020.
- The physician’s experience of changing clinical practice: a struggle to unlearn. Implementation Science, 12:1–11, 2017.
- Reinforcement learning with deep energy-based policies. In International Conference on Machine Learning, pages 1352–1361. PMLR, 2017.
- Regularized least squares temporal difference learning with nested l 2 and l 1 penalization. In European Workshop on Reinforcement Learning, pages 102–114. Springer, 2011.
- A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association, 47(260):663–685, 1952.
- Mimic-iii clinical database. Physio Net, 10:C2XW26, 2016a.
- Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016b.
- Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016c.
- Efficient evaluation of natural stochastic policies in offline reinforcement learning. arXiv preprint arXiv:2006.03886, 2020.
- Edward H Kennedy. Nonparametric causal effects based on incremental propensity score interventions. Journal of the American Statistical Association, 114(526):645–656, 2019.
- Bayesian estimates of equation system parameters: an application of integration by monte carlo. Econometrica: Journal of the Econometric Society, pages 1–19, 1978.
- Asymptotics for lasso-type estimators. Annals of statistics, pages 1356–1378, 2000.
- The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nature medicine, 24(11):1716–1720, 2018.
- Post-selection inference. Annual Review of Statistics and Its Application, 9:505–527, 2022.
- On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.
- Smooth imitation learning for online sequence prediction. In International Conference on Machine Learning, pages 680–688. PMLR, 2016.
- Batch policy learning under constraints. In International Conference on Machine Learning, pages 3703–3712. PMLR, 2019.
- Edward E Leamer. False models and post-data model construction. Journal of the American Statistical Association, 69(345):122–131, 1974.
- Interrogating a clinical database to study treatment of hypotension in the critically ill. BMJ open, 2(3):e000916, 2012.
- An actor-critic contextual bandit algorithm for personalized mobile health interventions. arXiv preprint arXiv:1706.09090, 2017.
- Learning neural network policies with guided policy search under unknown dynamics. In NIPS, volume 27, pages 1071–1079. Citeseer, 2014.
- Zachary C Lipton. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, 16(3):31–57, 2018.
- Learning to diagnose with lstm recurrent neural networks. arXiv preprint arXiv:1511.03677, 2015.
- Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. Advances in Neural Information Processing Systems, 26, 2013.
- Estimating dynamic treatment regimes in mobile health using v-learning. Journal of the American Statistical Association, 2019.
- Tim Miller. Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence, 267:1–38, 2019.
- Iván Díaz Muñoz and Mark van der Laan. Population intervention causal effects based on stochastic interventions. Biometrics, 68(2):541–549, 2012.
- Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning, pages 625–632, 2005.
- Art B. Owen. Monte Carlo theory, methods and examples. 2013.
- Generalizing off-policy evaluation from a causal perspective for sequential decision-making. arXiv preprint arXiv:2201.08262, 2022.
- Judea Pearl. Causality. Cambridge university press, 2009.
- Relative entropy policy search. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 24, 2010.
- On the distribution of the adaptive lasso estimator. Journal of Statistical Planning and Inference, 139(8):2775–2790, 2009.
- Doina Precup. Eligibility traces for off-policy policy evaluation. Computer Science Department Faculty Publication Series, page 80, 2000.
- Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
- Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89(427):846–866, 1994.
- Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019.
- Vasopressor therapy in the intensive care unit. In Seminars in Respiratory and Critical Care Medicine, volume 42, pages 059–077. Thieme Medical Publishers, Inc., 2021.
- Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Reinforcement learning: An introduction. 2018.
- Philip S Thomas. Safe reinforcement learning. PhD thesis, 2015.
- A review of off-policy evaluation in reinforcement learning. arXiv preprint arXiv:2212.06355, 2022.
- Weighted likelihood policy search with model selection. Advances in Neural Information Processing Systems, 25:2357–2365, 2012.
- A calibration hierarchy for risk models was defined: from utopia to empirical data. Journal of clinical epidemiology, 74:167–176, 2016.
- Mark J van der Laan and Sherri Rose. Targeted learning in data science: causal inference for complex longitudinal studies. Springer, 2018.
- Aad W van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
- Predicting acute kidney injury at hospital re-entry using high-dimensional electronic health record data. PloS one, 13(11):e0204920, 2018.
- Relative sparsity for medical decision problems. Statistics in Medicine, 2023.
- Jeffrey M Wooldridge. Econometric analysis of cross section and panel data. MIT press, 2010.
- Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, 2019.
- Maximum entropy inverse reinforcement learning. In Aaai, volume 8, pages 1433–1438. Chicago, IL, USA, 2008.
- Hui Zou. The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429, 2006.