Papers
Topics
Authors
Recent
Search
2000 character limit reached

RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health Interventions

Published 11 Dec 2023 in stat.ML and cs.LG | (2312.06403v4)

Abstract: Mobile health leverages personalized and contextually tailored interventions optimized through bandit and reinforcement learning algorithms. In practice, however, challenges such as participant heterogeneity, nonstationarity, and nonlinear relationships hinder algorithm performance. We propose RoME, a Robust Mixed-Effects contextual bandit algorithm that simultaneously addresses these challenges via (1) modeling the differential reward with user- and time-specific random effects, (2) network cohesion penalties, and (3) debiased machine learning for flexible estimation of baseline rewards. We establish a high-probability regret bound that depends solely on the dimension of the differential-reward model, enabling us to achieve robust regret bounds even when the baseline reward is highly complex. We demonstrate the superior performance of the RoME algorithm in a simulation and two off-policy evaluation studies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Improved algorithms for linear stochastic bandits. In NIPS, pp.  2312–2320, 2011.
  2. Linear Thompson sampling revisited. Electronic Journal of Statistics, 11(2):5165 – 5197, 2017. doi: 10.1214/17-EJS1341SI. URL https://doi.org/10.1214/17-EJS1341SI.
  3. mHealth app using machine learning to increase physical activity in diabetes and depression: clinical trial protocol for the DIAMANTE study. BMJ Open, 10(8), 2020.
  4. Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37(4):263–296, 2003.
  5. Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962–973, 2005.
  6. Sense2stop: A micro-randomized trial using wearable sensors to optimize a just-in-time-adaptive stress management intervention for smoking relapse prevention. Contemporary Clinical Trials, 109, 2021.
  7. To prompt or not to prompt? a microrandomized trial of time-varying push notifications to increase proximal engagement with a mobile health app. JMIR Mhealth Uhealth, 6(11):e10123, 2018.
  8. Breiman, L. Random forests. Machine learning, 45:5–32, 2001.
  9. A gang of bandits. Advances in neural information processing systems, 26, 2013.
  10. Debiased machine learning without sample-splitting for stable estimators. arXiv preprint arXiv:2206.01825, 2022.
  11. Double/debiased machine learning for treatment and structural parameters, 2018.
  12. Semi-parametric contextual bandits with graph-laplacian regularization. arXiv preprint arXiv:2205.08295, 2022.
  13. Stochastic gradient trees. In Asian Conference on Machine Learning, pp.  1094–1109. PMLR, 2019.
  14. Action centered contextual bandits. Advances in neural information processing systems, 30, 2017.
  15. Hill, J. L. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1):217–240, 2011.
  16. Virtual application-supported environment to increase exercise (valentine) during cardiac rehabilitation study: Rationale and design. American Heart Journal, 248:53–62, 2022.
  17. Kennedy, E. H. Towards optimal doubly robust estimation of heterogeneous causal effects, 2020. URL https://arxiv.org/abs/2004.14497.
  18. Contextual multi-armed bandit algorithm for semiparametric reward model. In International Conference on Machine Learning, pp. 3389–3397. PMLR, 2019.
  19. Doubly robust thompson sampling with linear payoffs. Advances in Neural Information Processing Systems, 34:15830–15840, 2021.
  20. Double doubly robust thompson sampling for generalized linear contextual bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  8300–8307, 2023.
  21. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  22. Efficacy of contextually tailored suggestions for physical activity: A micro-randomized optimization trial of HeartSteps. Annals of Behavioral Medicine, 53(6):573–582, 2019.
  23. Semiparametric contextual bandits. In International Conference on Machine Learning, pp. 2776–2785. PMLR, 2018.
  24. Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences, 116(10):4156–4165, 2019.
  25. Randomized exploration in generalized linear bandits. In International Conference on Artificial Intelligence and Statistics, pp.  2066–2076. PMLR, 2020.
  26. Exploring the state-of-receptivity for mhealth interventions. Proc ACM Interact Mob Wearable Ubiquitous Technol, 3(4):e12547, 2019.
  27. A contextual-bandit approach to personalized news article recommendation. WWW, pp.  661–670, 2010.
  28. Provably optimal algorithms for generalized linear contextual bandits. In International Conference on Machine Learning, pp. 2071–2080. PMLR, 2017.
  29. Personalized heartsteps: A reinforcement learning algorithm for optimizing physical activity. Proc ACM Interact Mob Wearable Ubiquitous Technol, 4(1), 2020.
  30. Using dynamical quantization to perform split attempts in online tree regressors. Pattern Recognition Letters, 145:37–42, 2021.
  31. McCullagh, P. Generalized linear models. Routledge, 2019.
  32. Just-in-time adaptive interventions (JITAIs) in mobile health: Key components and design principles for ongoing health behavior support. Ann Behav Med., 52(6):446–462, 2018.
  33. The mobile assistance for regulating smoking (MARS) micro-randomized trial design protocol. Contemporary Clinical Trials, 110, 2021.
  34. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pp.  807–814, 2010.
  35. Assessing real-time moderation for developing adaptive mobile health interventions for medical interns: micro-randomized trial. Journal of medical Internet research, 22(3):e15033, 2020.
  36. Generalized linear models. Journal of the Royal Statistical Society Series A: Statistics in Society, 135(3):370–384, 1972.
  37. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299–319, 2021.
  38. Text message responsivity in a 2-way short message service pilot intervention with adolescent and young adult survivors of cancer. JMIR Mhealth Uhealth, 7(4):e12547, 2019.
  39. Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. arXiv preprint arXiv:1802.09127, 2018.
  40. Debiased machine learning of conditional average treatment effects and other causal functions. The Econometrics Journal, 24(2):264–289, 2021.
  41. A meta-learning method for estimation of causal excursion effects to assess time-varying moderation. arXiv preprint arXiv:2306.16297, 2023.
  42. Scalable bayesian optimization using deep neural networks. In International conference on machine learning, pp. 2171–2180. PMLR, 2015.
  43. From ads to interventions: Contextual bandits in mobile health. Mobile health: sensors, analytic methods, and applications, pp.  495–517, 2017.
  44. Intelligentpooling: Practical thompson sampling for mhealth. Machine learning, 110(9):2685–2727, 2021.
  45. Designing reinforcement learning algorithms for digital interventions: Pre-implementation guidelines. Algorithms, 15(8), 2022.
  46. Optimism in reinforcement learning with generalized linear function approximation. arXiv preprint arXiv:1912.04136, 2019.
  47. Thompson sampling via local uncertainty. In International Conference on Machine Learning, pp. 10115–10125. PMLR, 2020.
  48. Laplacian-regularized graph bandits: Algorithms and theoretical analysis. In International Conference on Artificial Intelligence and Statistics, pp.  3133–3143. PMLR, 2020.
  49. Scalable thompson sampling via optimal transport. arXiv preprint arXiv:1902.07239, 2019.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 1 like about this paper.