Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery (2404.07185v2)
Abstract: Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decision-making processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This paper introduces a sample-efficient method that learns a robust reward function from a limited amount of ranked suboptimal demonstrations consisting of partial-view point cloud observations. The method then learns a policy by optimizing the learned reward function using reinforcement learning (RL). We show that using a learned reward function to obtain a policy is more robust than pure imitation learning. We apply our approach on a physical surgical electrocautery task and demonstrate that our method can perform well even when the provided demonstrations are suboptimal and the observations are high-dimensional point clouds. Code and videos available here: https://sites.google.com/view/lfdinelectrocautery
- X. Zhang, D. Lin, H. Pforsich, and V. W. Lin, “Physician workforce in the united states of america: forecasting nationwide shortages,” Human resources for health, vol. 18, no. 1, pp. 1–9, 2020.
- A. Attanasio, B. Scaglioni, E. De Momi, P. Fiorini, and P. Valdastri, “Autonomy in surgical robotics,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 4, pp. 651–679, 2021.
- M. Ginesi, D. Meli, A. Roberti, N. Sansonetto, and P. Fiorini, “Autonomous task planning and situation awareness in robotic surgery,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 3144–3150.
- B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,” Robotics and autonomous systems, vol. 57, no. 5, pp. 469–483, 2009.
- H. Ravichandar, A. S. Polydoros, S. Chernova, and A. Billard, “Recent advances in robot learning from demonstration,” Annual review of control, robotics, and autonomous systems, vol. 3, pp. 297–330, 2020.
- S. Arora and P. Doshi, “A survey of inverse reinforcement learning: Challenges, methods and progress,” Artificial Intelligence, vol. 297, p. 103500, 2021.
- A. Pore, E. Tagliabue, M. Piccinelli, D. Dall’Alba, A. Casals, and P. Fiorini, “Learning from demonstrations for autonomous soft-tissue retraction,” in International Symposium on Medical Robotics (ISMR), 2021, pp. 1–7.
- Y. Huang, M. Bentley, T. Hermans, and A. Kuntz, “Toward learning context-dependent tasks from demonstration for tendon-driven surgical robots,” in International Symposium on Medical Robotics (ISMR), 2021, pp. 1–7.
- H. Su, A. Mariani, S. E. Ovur, A. Menciassi, G. Ferrigno, and E. De Momi, “Toward teaching by demonstration for robot-assisted minimally invasive surgery,” IEEE Transactions on Automation Science and Engineering, vol. 18, no. 2, pp. 484–494, 2021.
- J. W. Kim, C. He, M. Urias, P. Gehlbach, G. D. Hager, I. Iordachita, and M. Kobilarov, “Autonomously navigating a surgical tool inside the eye by learning from demonstration,” in IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 7351–7357.
- K. L. Schwaner, D. Dall’Alba, P. T. Jensen, P. Fiorini, and T. R. Savarimuthu, “Autonomous needle manipulation for robotic surgical suturing based on skills learned from demonstration,” in IEEE 17th international conference on automation science and engineering (CASE), 2021, pp. 235–241.
- D. S. Brown, W. Goo, P. Nagarajan, and S. Niekum, “Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations,” in International conference on machine learning. PMLR, 2019, pp. 783–792.
- G. Swamy, S. Choudhury, J. A. Bagnell, and S. Wu, “Of moments and matching: A game-theoretic framework for closing the imitation gap,” in International Conference on Machine Learning. PMLR, 2021, pp. 10 022–10 032.
- B. Thananjeyan, A. Garg, S. Krishnan, C. Chen, L. Miller, and K. Goldberg, “Multilateral surgical pattern cutting in 2d orthotropic gauze with deep reinforcement learning policies for tensioning,” in IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 2371–2378.
- A. Pore, D. Corsi, E. Marchesini, D. Dall’Alba, A. Casals, A. Farinelli, and P. Fiorini, “Safe reinforcement learning using formal verification for tissue retraction in autonomous robotic-assisted surgery,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 4025–4031.
- Z.-Y. Chiu, F. Richter, E. K. Funk, R. K. Orosco, and M. C. Yip, “Bimanual regrasping for suture needles using reinforcement learning for rapid motion planning,” in IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 7737–7743.
- C. Wirth, R. Akrour, G. Neumann, and J. Fürnkranz, “A survey of preference-based reinforcement learning methods,” Journal of Machine Learning Research, vol. 18, no. 136, pp. 1–46, 2017.
- P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” Advances in neural information processing systems, vol. 30, 2017.
- L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27 730–27 744, 2022.
- J. Tien, J. Z.-Y. He, Z. Erickson, A. Dragan, and D. S. Brown, “Causal confusion and reward misidentification in preference-based reward learning,” in The Eleventh International Conference on Learning Representations, 2022.
- D. Shin, A. D. Dragan, and D. S. Brown, “Benchmarks and algorithms for offline preference-based reward learning,” arXiv preprint arXiv:2301.01392, 2023.
- D. S. Brown, W. Goo, and S. Niekum, “Better-than-demonstrator imitation learning via automatically-ranked demonstrations,” in Conference on robot learning. PMLR, 2020, pp. 330–359.
- I. Cordero, “Electrosurgical units–how they work and how to use them safely,” Community eye health, vol. 28, no. 89, p. 15, 2015.
- G. Groot and E. W. Chappell, “Electrocautery used to create incisions does not increase wound infection rates,” The American journal of surgery, vol. 167, no. 6, pp. 601–603, 1994.
- K. C. Un, Y. C. Wang, W. Wu, and G. K. K. Leung, “Systemic progesterone for modulating electrocautery-induced secondary brain injury,” Journal of Clinical Neuroscience, vol. 20, no. 9, pp. 1329–1330, 2013.
- M. L. Morris, R. D. Tucker, T. H. Baron, and L. M. W. K. Song, “Electrosurgery in gastrointestinal endoscopy: principles to practice,” Official journal of the American College of Gastroenterology— ACG, vol. 104, no. 6, pp. 1563–1574, 2009.
- A. Ismail, A. I. Abushouk, A. Elmaraezy, A. Menshawy, E. Menshawy, M. Ismail, E. Samir, A. Khaled, H. Zakarya, A. El-Tonoby et al., “Cutting electrocautery versus scalpel for surgical incisions: a systematic review and meta-analysis,” journal of surgical research, vol. 220, pp. 147–163, 2017.
- S. Krishnan, A. Garg, R. Liaw, B. Thananjeyan, L. Miller, F. T. Pokorny, and K. Goldberg, “Swirl: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards,” The international journal of robotics research, vol. 38, no. 2-3, pp. 126–145, 2019.
- M. T. Spaan, “Partially observable markov decision processes,” in Reinforcement learning: State-of-the-art. Springer, 2012, pp. 387–414.
- T. Wu, L. Pan, J. Zhang, T. Wang, Z. Liu, and D. Lin, “Density-aware chamfer distance as a comprehensive metric for point cloud completion,” arXiv preprint arXiv:2111.12702, 2021.
- ——, “Balanced chamfer distance as a comprehensive metric for point cloud completion,” Advances in Neural Information Processing Systems, vol. 34, pp. 29 088–29 100, 2021.
- Y. Liu, G. Datta, E. Novoseller, and D. S. Brown, “Efficient preference-based reinforcement learning using learned dynamics models,” in International Conference on Robotics and Automation (ICRA), 2023.
- R. G. Olsen, M. F. Genét, L. Konge, and F. Bjerrum, “Crowdsourced assessment of surgical skills: A systematic review,” The American Journal of Surgery, 2022.
- L. W. White, T. M. Kowalewski, R. L. Dockter, B. Comstock, B. Hannaford, and T. S. Lendvay, “Crowd-sourced assessment of technical skill: a valid method for discriminating basic robotic surgery skills,” Journal of endourology, vol. 29, no. 11, pp. 1295–1301, 2015.
- D. E. Whitney, “Resolved motion rate control of manipulators and human prostheses,” IEEE Transactions on man-machine systems, vol. 10, no. 2, pp. 47–53, 1969.
- J. Liang, V. Makoviychuk, A. Handa, N. Chentanez, M. Macklin, and D. Fox, “Gpu-accelerated robotic simulation for distributed reinforcement learning,” in Conference on Robot Learning. PMLR, 2018, pp. 270–282.
- P. Kazanzides, Z. Chen, A. Deguet, G. S. Fischer, R. H. Taylor, and S. P. DiMaio, “An open-source research kit for the da vinci® surgical system,” in IEEE international conference on robotics and automation (ICRA), 2014, pp. 6434–6439.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” Advances in neural information processing systems, vol. 1, 1988.
- D. S. Brown, Y. Cui, and S. Niekum, “Risk-aware active inverse reinforcement learning,” in Conference on Robot Learning. PMLR, 2018, pp. 362–372.
- E. Biyik and M. Palan, “Asking easy questions: A user-friendly approach to active reward learning,” in Proceedings of the 3rd Conference on Robot Learning, 2019.
- Zohre Karimi (2 papers)
- Shing-Hei Ho (5 papers)
- Bao Thach (8 papers)
- Alan Kuntz (24 papers)
- Daniel S. Brown (46 papers)