ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment (2403.06814v1)
Abstract: Deep Brain Stimulation (DBS) stands as an effective intervention for alleviating the motor symptoms of Parkinson's disease (PD). Traditional commercial DBS devices are only able to deliver fixed-frequency periodic pulses to the basal ganglia (BG) regions of the brain, i.e., continuous DBS (cDBS). However, they in general suffer from energy inefficiency and side effects, such as speech impairment. Recent research has focused on adaptive DBS (aDBS) to resolve the limitations of cDBS. Specifically, reinforcement learning (RL) based approaches have been developed to adapt the frequencies of the stimuli in order to achieve both energy efficiency and treatment efficacy. However, RL approaches in general require significant amount of training data and computational resources, making it intractable to integrate RL policies into real-time embedded systems as needed in aDBS. In contrast, contextual multi-armed bandits (CMAB) in general lead to better sample efficiency compared to RL. In this study, we propose a CMAB solution for aDBS. Specifically, we define the context as the signals capturing irregular neuronal firing activities in the BG regions (i.e., beta-band power spectral density), while each arm signifies the (discretized) pulse frequency of the stimulation. Moreover, an {\epsilon}-exploring strategy is introduced on top of the classic Thompson sampling method, leading to an algorithm called {\epsilon}-Neural Thompson sampling ({\epsilon}-NeuralTS), such that the learned CMAB policy can better balance exploration and exploitation of the BG environment. The {\epsilon}-NeuralTS algorithm is evaluated using a computation BG model that captures the neuronal activities in PD patients' brains. The results show that our method outperforms both existing cDBS methods and CMAB baselines.
- C. Marras, J. Beck, and et al., “Prevalence of parkinson’s disease across north america,” in NPJ Parkinson’s disease, 2018.
- A. L. Benabid, “Deep brain stimulation for parkinson’s disease,” in Current opinion in neurobiology, 2003, pp. 696–706.
- G. Deuschl, C. Schade-Brittinger, and et al., “A randomized trial of deep-brain stimulation for parkinson’s disease,” in New England Journal of Medicine, 2006, pp. 896–908.
- K. A. Follett, F. M. Weaver, and et al., “Pallidal versus subthalamic deep-brain stimulation for parkinson’s disease,” in New England Journal of Medicine, 2010, pp. 2077–2091.
- M. S. Okun, “Deep-brain stimulation for parkinson’s disease,” in New England Journal of Medicine, 2012, pp. 1529–1538.
- J. Pineau, A. Guez, R. Vincent, G. Panuccio, and M. Avoli, “Treating epilepsy via adaptive neurostimulation: a reinforcement learning approach,” in Int. Journal of Neural Systems, 2009, pp. 227–240.
- M. Beudel and P. Brown, “Adaptive deep brain stimulation in parkinson’s disease,” in Parkinsonism & related disorders, 2016, pp. 123–126.
- M. Arlotti, M. Rosa, and et al., “The adaptive deep brain stimulation challenge,” in Parkinsonism & related disorders, 2016, pp. 12–17.
- M. Arlotti, L. Rossi, and et al., “An external portable device for adaptive deep brain stimulation (adbs) clinical research in advanced parkinson’s disease,” in Medical engineering & physics, 2016, pp. 498–505.
- L. S, P. A, and et al., “Adaptive deep brain stimulation in advanced parkinson disease,” in Ann Neurol, 2013, pp. 449–457.
- S. Little, E. Tripoliti, and et al., “Adaptive deep brain stimulation for parkinson’s disease demonstrates reduced speech side effects compared to conventional stimulation in the acute setting,” in J Neurol Neurosurg Psychiatry, 2016, pp. 1388–1389.
- Q. Gao, S. L. Schmidt, K. Kamaravelu, D. A. Turner, W. M. Grill, and M. Pajic, “Offline policy evaluation for learning-based deep brain stimulation controllers,” in 13th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS), 2022, pp. 80–91.
- Q. Gao, S. L. Schmidt, A. Chowdhury, G. Feng, J. J. Peters, K. Genty, W. M. Grill, D. A. Turner, and M. Pajic, “Offline learning of closed-loop deep brain stimulation controllers for parkinson disease treatment,” in ACM/IEEE 14th International Conference on Cyber-Physical Systems (ICCPS), 2023, pp. 44–55.
- S. L. Schmidt, A. H. Chowdhury, K. T. Mitchell, J. J. Peters, Q. Gao, H.-J. Lee, K. Genty, S.-C. Chow, W. M. Grill, M. Pajic, and D. A. Turner, “At home adaptive dual target deep brain stimulation in Parkinson’s disease with proportional control,” Brain, vol. 147, no. 3, pp. 911–922, 12 2023.
- J. Habets, M. Heijmans, and et al., “An update on adaptive deep brain stimulation in parkinson’s disease,” in Movement Disorders, 2018, pp. 1834–1843.
- P. Sarikhani, H.-L. Hsu, and B. Mahmoudi, “Automated tuning of closedloop neuromodulation control systems using bayesian optimization,” in 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2022, pp. 1734–1737.
- V. Nagaraj, A. Lamperski, and T. I. Netoff, “Seizure control in a computational model using a reinforcement learning stimulation paradigm,” in International J. of Neural Sys, 2017.
- Q. Gao, M. Naumann, I. Jovanov, V. Lesi, K. Kumaravelu, W. Grill, and M. Pajic, “Model-based design of closed loop deep brain stimulation controller using reinforcement learning,” in ACM/IEEE 11th Int. Conf. on Cyber-Physical Systems (ICCPS), 2020, pp. 108–118.
- P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” in Machine Learning, 2002, pp. 235–256.
- L. Cannelli, G. Nuti, M. Sala, and O. Szehr, “Hedging using reinforcement learning: Contextual k-armed bandit versus q-learning,” The Journal of Finance and Data Science, vol. 9, 2023.
- T. Y and et al., “Towards adaptive deep brain stimulation: clinical and technical notes on a novel commercial device for chronic brain sensing,” in Journal of Neural Engineering, vol. 18, 2021.
- K. Kumaravelu, D. T. Brocker, and W. M. Grill, “A biophysical model of the cortex-basal ganglia-thalamus network in the 6-ohda lesioned rat model of parkinson’s disease,” in Journal of computational neuroscience, 2016, pp. 207–229.
- R. Q. So, A. R. Kent, and W. M. Grill, “Relative contributions of local cell and passing fiber activation and silencing to changes in thalamic fidelity during deep brain stimulation and lesioning: a computational modeling study,” in Journal of computational neuroscience, vol. 32, 2012, pp. 499–519.
- I. Jovanov, M. Naumann, K. Kumaravelu, W. M. Grill, and M. Pajic, “Platform for model-based design and testing for deep brain stimulation,” in ACM/IEEE 9th International Conference on Cyber-Physical Systems (ICCPS), April 2018, pp. 263–274.
- W. R. Thompson, “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples,” in Biometrika, 1933, pp. 285–294.
- D. T. Brocker and et al., “Optimized temporal pattern of brain stimulation designed by computational evolution,” in Science Translational Medicine, 2017.
- T. Lattimore and C. Szepesv´ari, “Bandit algorithms,” Cambridge University Press, 2018.
- S. Agrawal and N. Goyal, “Thompson sampling for contextual bandits with linear payoffs,” in International Conference on Machine Learning, 2013, pp. 127–135.
- S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” in Foundations and Trends in Machine Learning, 2011, pp. 1–122.
- Y. Abbasi-yadkori, D. Pál, and C. Szepesvári, “Improved algorithms for linear stochastic bandits,” in Advances in Neural Information Processing Systems, J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, Eds., vol. 24. Curran Associates, Inc., 2011.
- L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextual-bandit approach to personalized news article recommendation,” in Proc. of the 19th Int. Conf. on World Wide Web, 2010, pp. 661–670.
- S. Filippi, O. Cappe, A. Garivier, and C. Szepesvári, “Parametric bandits: The generalized linear case,” in Advances in Neural Information Processing Systems, J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, Eds., vol. 23. Curran Associates, Inc., 2010.
- B. Kveton, M. Zaheer, C. Szepesvari, L. Li, M. Ghavamzadeh, and C. Boutilier, “Randomized exploration in generalized linear bandits,” in International Conference on Artificial Intelligence and Statistics, 2020, pp. 2066–2076.
- L. Li, Y. Lu, and D. Zhou, “Provably optimal algorithms for generalized linear contextual bandits,” in International Conference on Machine Learning, 2017.
- Q. Ding, C.-J. Hsieh, and J. Sharpnack, “An efficient algorithm for generalized linear bandit: Online stochastic gradient descent and thompson sampling,” ArXiv, vol. abs/2006.04012, 2020.
- C. Riquelme, G. Tucker, and J. Snoek, “Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling.” in International Conference on Learning Representation (ICLR), 2018.
- W. Zhang, D. Zhou, L. Li, and Q. Gu, “Neural thompson sampling,” in International Conference on Learning Representations, 2021.
- D. Zhou, L. Li, and Q. Gu, “Neural contextual bandits with ucb-based exploration,” in International Conference on Machine Learning, 2020, pp. 11 492–11 502.
- P. Xu, Z. Wen, H. Zhao, and Q. Gu, “Neural contextual bandits with deep representation and shallow exploration,” in International Conference on Learning Representations, 2022.
- O. Chapelle and L. Li, “An empirical evaluation of thompson sampling,” in Advances in Neural Information Processing Systems. Curran Associates, Inc., 2011, pp. 2249–2257.
- H.-L. Hsu, Q. Huang, and S. Ha, “Improving safety in deep reinforcement learning using unsupervised action planning,” in 2022 IEEE International Conference on Robotics and Automation (ICRA), 2022, pp. 5567–5573.
- P. Sarikhani, H.-L. Hsu, O. Kara, J. K. Kim, H. Esmaeilzadeh, and B. Mahmoudi, “Neuroweaver: a platform for designing intelligent closed-loop neuromodulation systems,” Brain Stimulation, vol. 14, no. 6, p. 1661, 2021.
- T. Jin, X. Yang, X. Xiao, and P. Xu, “Thompson sampling with less exploration is fast and optimal,” in International Conference on Machine Learning, 2023.
- T. Jin, H.-L. Hsu, W. Chang, and P. Xu, “Finite-time frequentist regret bounds of multi-agent thompson sampling on sparse hypergraphs,” in Annual AAAI Conference on Artificial Intelligence (AAAI), 2024.
- A. Guez, R. D. Vincent, M. Avoli, and J. Pineau, “Adaptive treatment of epilepsy via batch-mode reinforcement learning,” in AAAI, 2008, pp. 1671–1678.
- L. Li, Y. Lu, and D. Zhou., “Provably optimal algorithms for generalized linear contextual bandits,” in International Conference on Machine Learning, 2017, pp. 2071–2080.
- C. Riquelme, G. Tucker, and J. Snoek., “Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling,” in International Conference on Learning Representations, 2018.
- J. Dong, H.-L. Hsu, Q. Gao, V. Tarokh, and M. Pajic, “Robust reinforcement learning through efficient adversarial herding,” https://arxiv.org/abs/2306.07408, 2023.
- H.-L. Hsu, H. Meng, S. Luo, J. Dong, V. Tarokh, and M. Pajic, “Reforma: Robust reinforcement learning via adaptive adversary for drones flying under disturbances,” in 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024.