AI in Pharma for Personalized Sequential Decision-Making: Methods, Applications and Opportunities (2311.18725v1)
Abstract: In the pharmaceutical industry, the use of AI has seen consistent growth over the past decade. This rise is attributed to major advancements in statistical machine learning methodologies, computational capabilities and the increased availability of large datasets. AI techniques are applied throughout different stages of drug development, ranging from drug discovery to post-marketing benefit-risk assessment. Kolluri et al. provided a review of several case studies that span these stages, featuring key applications such as protein structure prediction, success probability estimation, subgroup identification, and AI-assisted clinical trial monitoring. From a regulatory standpoint, there was a notable uptick in submissions incorporating AI components in 2021. The most prevalent therapeutic areas leveraging AI were oncology (27%), psychiatry (15%), gastroenterology (12%), and neurology (11%). The paradigm of personalized or precision medicine has gained significant traction in recent research, partly due to advancements in AI techniques \cite{hamburg2010path}. This shift has had a transformative impact on the pharmaceutical industry. Departing from the traditional "one-size-fits-all" model, personalized medicine incorporates various individual factors, such as environmental conditions, lifestyle choices, and health histories, to formulate customized treatment plans. By utilizing sophisticated machine learning algorithms, clinicians and researchers are better equipped to make informed decisions in areas such as disease prevention, diagnosis, and treatment selection, thereby optimizing health outcomes for each individual.
- “Machine learning and artificial intelligence in pharmaceutical research and development: a review” In The AAPS Journal 24 Springer, 2022, pp. 1–10
- “Landscape analysis of the application of artificial intelligence and machine learning in regulatory submissions for drug development from 2016 to 2021” In Clinical pharmacology and therapeutics 113.4, 2023, pp. 771–774
- Margaret A Hamburg and Francis S Collins “The path to personalized medicine” In New England Journal of Medicine 363.4 Mass Medical Soc, 2010, pp. 301–304
- Bibhas Chakraborty and Erica E Moodie “Statistical methods for dynamic treatment regimes” In Springer-Verlag. doi 10.978-1 Springer, 2013, pp. 4–1
- Michael R Kosorok and Eric B Laber “Precision medicine” In Annual review of statistics and its application 6 Annual Reviews, 2019, pp. 263–286
- “Dynamic treatment regimes: Technical challenges and applications” In Electronic journal of statistics 8.1 NIH Public Access, 2014, pp. 1225
- “Q-learning: Theory and applications” In Annual Review of Statistics and Its Application 7 Annual Reviews, 2020, pp. 279–301
- SA Murphy “A Generalization Error for Q-Learning.” In Journal of Machine Learning Research: JMLR 6, 2005, pp. 1073–1097
- James M Robins “Optimal structural nested models for optimal sequential decisions” In Proceedings of the second seattle Symposium in Biostatistics, 2004, pp. 189–326 Springer
- Yufan Zhao, Michael R Kosorok and Donglin Zeng “Reinforcement learning design for cancer clinical trials” In Statistics in medicine 28.26 Wiley Online Library, 2009, pp. 3294–3315
- “Q-learning: A data analysis method for constructing adaptive interventions.” In Psychological methods 17.4 American Psychological Association, 2012, pp. 478
- Michael R Kosorok and Erica EM Moodie “Adaptive treatment strategies in practice: planning trials and analyzing data for personalized medicine” SIAM, 2015
- Susan A Murphy “Optimal dynamic treatment regimes” In Journal of the Royal Statistical Society Series B: Statistical Methodology 65.2 Oxford University Press, 2003, pp. 331–355
- “Q-and A-learning methods for estimating optimal dynamic treatment regimes” In Statistical science: a review journal of the Institute of Mathematical Statistics 29.4 NIH Public Access, 2014, pp. 640
- “Robust Q-learning” In Journal of the American Statistical Association 116.533 Taylor & Francis, 2021, pp. 368–381
- “Penalized q-learning for dynamic treatment regimens” In Statistica Sinica 25.3 NIH Public Access, 2015, pp. 901
- Bibhas Chakraborty, Eric B Laber and Ying-Qi Zhao “Inference about the expected performance of a data-driven dynamic treatment regime” In Clinical Trials 11.4 SAGE Publications Sage UK: London, England, 2014, pp. 408–417
- Thomas A Murray, Ying Yuan and Peter F Thall “A Bayesian machine learning approach for optimizing dynamic treatment regimes” In Journal of the American Statistical Association 113.523 Taylor & Francis, 2018, pp. 1255–1267
- “Multi-stage optimal dynamic treatment regimes for survival outcomes with dependent censoring” In Biometrika 110.2 Oxford University Press, 2023, pp. 395–410
- Wensheng Zhu, Donglin Zeng and Rui Song “Proper inference for value function in high-dimensional Q-learning for dynamic treatment regimes” In Journal of the American Statistical Association 114.527 Taylor & Francis, 2019, pp. 1404–1417
- “Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer” In Biometrics 67.4 Wiley Online Library, 2011, pp. 1422–1433
- “Reinforcement learning with action-derived rewards for chemotherapy and clinical trial dosing regimen selection” In Machine Learning for Healthcare Conference, 2018, pp. 161–226 PMLR
- “Machine learning for clinical trials in the era of COVID-19” In Statistics in biopharmaceutical research 12.4 Taylor & Francis, 2020, pp. 506–517
- “Reinforcement learning for intelligent healthcare applications: A survey” In Artificial Intelligence in Medicine 109 Elsevier, 2020, pp. 101964
- William H Press “Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research” In Proceedings of the National Academy of Sciences 106.52 National Acad Sciences, 2009, pp. 22387–22392
- Sofı́a S Villar, Jack Bowden and James Wason “Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges” In Statistical science: a review journal of the Institute of Mathematical Statistics 30.2 Europe PMC Funders, 2015, pp. 199
- Jean-Yves Audibert, Rémi Munos and Csaba Szepesvári “Exploration–exploitation tradeoff using variance estimates in multi-armed bandits” In Theoretical Computer Science 410.19 Elsevier, 2009, pp. 1876–1902
- “Bandit algorithms” Cambridge University Press, 2020
- “Analysis of thompson sampling for the multi-armed bandit problem” In Conference on learning theory, 2012, pp. 39–1 JMLR WorkshopConference Proceedings
- “On upper-confidence bound policies for switching bandit problems” In International Conference on Algorithmic Learning Theory, 2011, pp. 174–188 Springer
- John White “Bandit algorithms for website optimization” " O’Reilly Media, Inc.", 2013
- “Comparing Epsilon greedy and Thompson sampling model for multi-armed bandit algorithm on marketing dataset” In Journal of Applied Data Sciences 2.2, 2021
- Shie Mannor and John N Tsitsiklis “The sample complexity of exploration in the multi-armed bandit problem” In Journal of Machine Learning Research 5.Jun, 2004, pp. 623–648
- Sandeep Pandey, Deepayan Chakrabarti and Deepak Agarwal “Multi-armed bandit problems with dependent arms” In Proceedings of the 24th international conference on Machine learning, 2007, pp. 721–728
- “Thompson sampling for contextual bandit problems with auxiliary safety constraints” In arXiv preprint arXiv:1911.00638, 2019
- “Learning for dose allocation in adaptive clinical trials with safety constraints” In International Conference on Machine Learning, 2020, pp. 8730–8740 PMLR
- Saba Q Yahyaa and Bernard Manderick “Thompson Sampling for Multi-Objective Multi-Armed Bandits Problem.” In ESANN, 2015
- Ambuj Tewari and Susan A Murphy “From ads to interventions: Contextual bandits in mobile health” In Mobile health: sensors, analytic methods, and applications Springer, 2017, pp. 495–517
- “A contextual-bandit-based approach for informed decision-making in clinical trials” In Life 12.8 MDPI, 2022, pp. 1277
- John O’Quigley, Margaret Pepe and Lloyd Fisher “Continual reassessment method: a practical design for phase 1 clinical trials in cancer” In Biometrics JSTOR, 1990, pp. 33–48
- Beat Neuenschwander, Michael Branson and Thomas Gsponer “Critical aspects of the Bayesian approach to phase I cancer trials” In Statistics in medicine 27.13 Wiley Online Library, 2008, pp. 2420–2439
- Hongtao Zhang, Alan Y Chiang and Jixian Wang “Improving the performance of Bayesian logistic regression model with overdose control in oncology dose-finding studies” In Statistics in Medicine 41.27 Wiley Online Library, 2022, pp. 5463–5483
- “A Bayesian industry approach to phase I combination trials in oncology” In Statistical methods in drug combination studies 2015 Chapman & Hall/CRC Press: Boca Raton, FL, 2015, pp. 95–135
- Maryam Aziz, Emilie Kaufmann and Marie-Karelle Riviere “On multi-armed bandit designs for dose-finding clinical trials” In The Journal of Machine Learning Research 22.1 JMLRORG, 2021, pp. 686–723
- Lan Jin, Guodong Pang and Demissie Alemayehu “Multiarmed Bandit Designs for Phase I Dose-Finding Clinical Trials With Multiple Toxicity Types” In Statistics in Biopharmaceutical Research 15.1 Taylor & Francis, 2023, pp. 164–177
- “Mobile-health: A review of current state in 2015” In Journal of biomedical informatics 56 Elsevier, 2015, pp. 265–272
- James M Rehg, Susan A Murphy and Santosh Kumar “Mobile health” In Cham: Springer International Publishing Springer, 2017
- Richard S Sutton and Andrew G Barto “Reinforcement learning: An introduction” MIT press, 2018
- Martin L Puterman “Markov decision processes: discrete stochastic dynamic programming” John Wiley & Sons, 2014
- Scott Fujimoto, David Meger and Doina Precup “Off-policy deep reinforcement learning without exploration” In International conference on machine learning, 2019, pp. 2052–2062 PMLR
- Ashkan Ertefaie and Robert L Strawderman “Constructing dynamic treatment regimes over indefinite time horizons” In Biometrika 105.4 Oxford University Press, 2018, pp. 963–977
- “Estimating dynamic treatment regimes in mobile health using v-learning” In Journal of the American Statistical Association Taylor & Francis, 2019
- Christoph Dann, Gerhard Neumann and Jan Peters “Policy evaluation with temporal differences: A survey and comparison” In Journal of Machine Learning Research 15 Massachusetts Institute of Technology Press (MIT Press)/Microtome Publishing, 2014, pp. 809–883
- Wenzhuo Zhou, Ruoqing Zhu and Annie Qu “Estimating optimal infinite horizon dynamic treatment regimes via pt-learning” In Journal of the American Statistical Association Taylor & Francis, 2022, pp. 1–14
- Yuhan Li, Wenzhuo Zhou and Ruoqing Zhu “Quasi-optimal Reinforcement Learning with Continuous Actions” In The Eleventh International Conference on Learning Representations, 2022
- “Mobile health technology in the prevention and management of type 2 diabetes” In Indian journal of endocrinology and metabolism 21.2 Wolters Kluwer–Medknow Publications, 2017, pp. 334
- “The OhioT1DM dataset for blood glucose level prediction: Update 2020” In CEUR workshop proceedings 2675, 2020, pp. 71 NIH Public Access
- “Improving the estimation of mealtime insulin dose in adults with type 1 diabetes: the Normal Insulin Demand for Dose Adjustment (NIDDA) study” In Diabetes Care 34.10 Am Diabetes Assoc, 2011, pp. 2146–2151
- David Rodbard “Interpretation of continuous glucose monitoring data: glycemic variability and quality of glycemic control” In Diabetes technology & therapeutics 11.S1 Mary Ann Liebert, Inc. 140 Huguenot Street, 3rd Floor New Rochelle, NY 10801 USA, 2009, pp. S–55
- Wang Miao, Xu Shi and Eric Tchetgen Tchetgen “A confounding bridge approach for double negative control inference on causal effects” In arXiv preprint arXiv:1808.04945, 2018
- “Semiparametric proximal causal inference” In Journal of the American Statistical Association Taylor & Francis, 2023, pp. 1–12
- “Confounding-robust policy improvement” In Advances in neural information processing systems 31, 2018
- Jiayi Wang, Zhengling Qi and Chengchun Shi “Blessing from experts: Super reinforcement learning in confounded environments” In arXiv preprint arXiv:2209.15448, 2022
- “A minimax learning approach to off-policy evaluation in confounded partially observable markov decision processes” In International Conference on Machine Learning, 2022, pp. 20057–20094 PMLR
- “Bellman-consistent pessimism for offline reinforcement learning” In Advances in neural information processing systems 34, 2021, pp. 6683–6694
- Ying Jin, Zhuoran Yang and Zhaoran Wang “Is pessimism provably efficient for offline rl?” In International Conference on Machine Learning, 2021, pp. 5084–5096 PMLR
- “Pessimistic model-based offline reinforcement learning under partial coverage” In arXiv preprint arXiv:2107.06226, 2021
- Kamyar Ghasemipour, Shixiang Shane Gu and Ofir Nachum “Why so pessimistic? estimating uncertainties for offline rl through ensembles, and why their independence matters” In Advances in Neural Information Processing Systems 35, 2022, pp. 18267–18281
- Masatoshi Uehara, Chengchun Shi and Nathan Kallus “A review of off-policy evaluation in reinforcement learning” In arXiv preprint arXiv:2212.06355, 2022
- Philip Thomas, Georgios Theocharous and Mohammad Ghavamzadeh “High-confidence off-policy evaluation” In Proceedings of the AAAI Conference on Artificial Intelligence 29.1, 2015
- “Non-asymptotic confidence intervals of off-policy evaluation: Primal and dual bounds” In arXiv preprint arXiv:2103.05741, 2021
- “Statistical inference of the value function for reinforcement learning in infinite-horizon settings” In Journal of the Royal Statistical Society Series B: Statistical Methodology 84.3 Oxford University Press, 2022, pp. 765–793
- “Does the markov decision process fit the data: Testing for the markov property in sequential decision making” In International Conference on Machine Learning, 2020, pp. 8807–8817 PMLR
- “Constructing dynamic treatment regimes with shared parameters for censored data” In Statistics in medicine 39.9 Wiley Online Library, 2020, pp. 1250–1263
- “Multicategory angle-based learning for estimating optimal dynamic treatment regimes with censored data” In Journal of the American Statistical Association 117.539 Taylor & Francis, 2022, pp. 1438–1451
- Yuhan Li (49 papers)
- Hongtao Zhang (17 papers)
- Keaven Anderson (3 papers)
- Songzi Li (18 papers)
- Ruoqing Zhu (23 papers)