reBandit: Random Effects based Online RL algorithm for Reducing Cannabis Use
Abstract: The escalating prevalence of cannabis use, and associated cannabis-use disorder (CUD), poses a significant public health challenge globally. With a notably wide treatment gap, especially among emerging adults (EAs; ages 18-25), addressing cannabis use and CUD remains a pivotal objective within the 2030 United Nations Agenda for Sustainable Development Goals (SDG). In this work, we develop an online reinforcement learning (RL) algorithm called reBandit which will be utilized in a mobile health study to deliver personalized mobile health interventions aimed at reducing cannabis use among EAs. reBandit utilizes random effects and informative Bayesian priors to learn quickly and efficiently in noisy mobile health environments. Moreover, reBandit employs Empirical Bayes and optimization techniques to autonomously update its hyper-parameters online. To evaluate the performance of our algorithm, we construct a simulation testbed using data from a prior study, and compare against commonly used algorithms in mobile health studies. We show that reBandit performs equally well or better than all the baseline algorithms, and the performance gap widens as population heterogeneity increases in the simulation environment, proving its adeptness to adapt to diverse population of study participants.
- Mixed-effect thompson sampling. In International Conference on Artificial Intelligence and Statistics, pages 2087–2115. PMLR, 2023.
- No regrets for learning the prior in bandits. Advances in neural information processing systems, 34:28029–28041, 2021.
- Cannabis use, attitudes, and legal status in the us: A review. Preventive medicine, 104:13–23, 2017.
- Young-adult compared to adolescent onset of regular cannabis use: A 20-year prospective cohort study of later consequences. Drug and Alcohol Review, 40(4):627–636, 2021.
- mstress: A mobile recommender system for just-in-time interventions for stress. In 2017 14th IEEE annual consumer communications & networking conference (CCNC), pages 1–5. IEEE, 2017.
- Accurate inference for adaptive linear models. In International Conference on Machine Learning, pages 1194–1203. PMLR, 2018.
- Can the artificial intelligence technique of reinforcement learning use continuously-monitored digital data to optimize treatment for weight loss? Journal of behavioral medicine, 42:276–290, 2019.
- Microrandomized trial design for evaluating just-in-time adaptive interventions through mobile health technologies for cardiovascular disease. Circulation: Cardiovascular Quality and Outcomes, 14(2):e006760, 2021.
- Action centered contextual bandits. Advances in neural information processing systems, 30, 2017.
- Wayne Hall. The adverse health effects of cannabis use: what are they, and what are their implications for policy? International Journal of drug policy, 20(6):458–466, 2009.
- Prevalence and correlates of dsm-5 cannabis use disorder, 2012-2013: Findings from the national epidemiologic survey on alcohol and related conditions–iii. American Journal of Psychiatry, 173(6):588–599, 2016.
- Hierarchical bayesian bandits. In International Conference on Artificial Intelligence and Statistics, pages 7724–7741. PMLR, 2022.
- Preventer, a selection mechanism for just-in-time preventive interventions. IEEE Transactions on Affective Computing, 7(3):243–257, 2015.
- The dependence of effective planning horizon on model accuracy. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pages 1181–1189, 2015.
- A closer look at the worst-case behavior of multi-armed bandit algorithms. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 8807–8819. Curran Associates, Inc., 2021.
- Investigating intervention components and exploring states of receptivity for a smartphone app to promote physical activity: protocol of a microrandomized trial. JMIR research protocols, 8(1):e11540, 2019.
- Random-effects models for longitudinal data. Biometrics, pages 963–974, 1982.
- Nan Laird. Random effects and the linear mixed model. In Analysis of Longitudinal and Cluster-Correlated Data, volume 8, pages 79–96. Institute of Mathematical Statistics, 2004.
- The epoch-greedy algorithm for contextual multi-armed bandits. Advances in neural information processing systems, 20(1):96–1, 2007.
- The prevalence of healthcare effectiveness data and information set (hedis) initiation and engagement in treatment among patients with cannabis use disorders in 7 us health systems. Substance Abuse, 40(3):268–277, 2019.
- A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670, 2010.
- Personalized heartsteps: A reinforcement learning algorithm for optimizing physical activity. CoRR, abs/1909.03539, 2019.
- Development of a control-oriented model of social cognitive theory for optimized mhealth behavioral interventions. IEEE Transactions on Control Systems Technology, 28(2):331–346, 2018.
- Carl N Morris. Parametric empirical bayes inference: theory and applications. Journal of the American statistical Association, 78(381):47–55, 1983.
- Just-in-time adaptive interventions (jitais) in mobile health: key components and design principles for ongoing health behavior support. Annals of Behavioral Medicine, pages 1–17, 2018.
- Translating strategies for promoting engagement in mobile health: A proof-of-concept microrandomized trial. Health Psychology, 40(12):974, 2021.
- Engagement in digital interventions. American Psychologist, 2022.
- Poptherapy: Coping with stress through pop-culture. In Proceedings of the 8th international conference on pervasive computing technologies for healthcare, pages 109–117, 2014.
- Global statistics on alcohol, tobacco and illicit drug use: 2017 status report. Addiction, 113(10):1905–1926, 2018.
- Metalearning linear bandits by prior update. In International Conference on Artificial Intelligence and Statistics, pages 2885–2926. PMLR, 2022.
- Mybehavior: automatic personalized health feedback from user behaviors and preferences using smartphones. In Proceedings of the 2015 ACM international joint conference on pervasive and ubiquitous computing, pages 707–718, 2015.
- Toward increasing engagement in substance use data collection: development of the substance abuse research assistant app and protocol for a microrandomized trial using adolescents and emerging adults. JMIR research protocols, 7(7):e9850, 2018.
- Optimizing mhealth interventions with a bandit. Digital Phenotyping and Mobile Sensing: New Developments in Psychoinformatics, pages 277–291, 2019.
- George K Robinson. That blup is a good thing: the estimation of random effects. Statistical science, pages 15–32, 1991.
- Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4):1221–1243, 2014.
- SAMHSA. 2021 NSDUH Detailed Tables. https://www.samhsa.gov/data/report/2021-nsduh-detailed-tables, 2023. Accessed: 2024-02-14.
- Pilot randomized trial of moment, a motivational counseling-plus-ecological momentary intervention to reduce marijuana use in youth. Mhealth, 4, 2018.
- Bayesian decision-making under misspecified priors with applications to meta-learning. Advances in Neural Information Processing Systems, 34:26382–26394, 2021.
- Reinforcement learning: An introduction. MIT press, 2018.
- From ads to interventions: Contextual bandits in mobile health. Mobile health: sensors, analytic methods, and applications, pages 495–517, 2017.
- William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294, 1933.
- Intelligentpooling: Practical thompson sampling for mhealth. Machine learning, 110(9):2685–2727, 2021.
- Designing reinforcement learning algorithms for digital interventions: pre-implementation guidelines. Algorithms, 15(8):255, 2022.
- Reward design for an online reinforcement learning algorithm supporting oral self-care. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 15724–15730, 2023.
- Adverse health effects of marijuana use. New England Journal of Medicine, 370(23):2219–2227, 2014.
- Integrating behavioral science with mobile (mhealth) technology to optimize health behavior change interventions. European Psychologist, 2019.
- Metadata-based multi-task bandits with bayesian hierarchical models. Advances in Neural Information Processing Systems, 34:29655–29668, 2021.
- Towards scalable and robust structured bandits: A meta-learning framework. In International Conference on Artificial Intelligence and Statistics, pages 1144–1173. PMLR, 2023.
- Bandit problems with side observations. IEEE Transactions on Automatic Control, 50(3):338–355, 2005.
- Encouraging physical activity in patients with diabetes: Intervention using a reinforcement learning system. Journal of medical Internet research, 19(10):e338, 2017.
- Statistical inference after adaptive sampling for longitudinal data. arXiv preprint arXiv:2202.07098, 2022.
- Personalizing mobile fitness apps using reinforcement learning. In CEUR workshop proceedings, volume 2068. NIH Public Access, 2018.
- Random effect bandits. In International Conference on Artificial Intelligence and Statistics, pages 3091–3107. PMLR, 2022.
- Robust contextual linear bandits. arXiv preprint arXiv:2210.14483, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.