General-purpose foundation models for increased autonomy in robot-assisted surgery (2401.00678v1)
Abstract: The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific objectives that solve a single robotic problem such as picking up an object or reaching a target position. However, recent work on high-capacity models in robotics has shown promise toward being trained on large collections of diverse and task-agnostic datasets of video demonstrations. These models have shown impressive levels of generalization to unseen circumstances, especially as the amount of data and the model complexity scale. Surgical robot systems that learn from data have struggled to advance as quickly as other fields of robot learning for a few reasons: (1) there is a lack of existing large-scale open-source data to train models, (2) it is challenging to model the soft-body deformations that these robots work with during surgery because simulation cannot match the physical and visual complexity of biological tissue, and (3) surgical robots risk harming patients when tested in clinical trials and require more extensive safety measures. This perspective article aims to provide a path toward increasing robot autonomy in robot-assisted surgery through the development of a multi-modal, multi-task, vision-language-action model for surgical robots. Ultimately, we argue that surgical robots are uniquely positioned to benefit from general-purpose models and provide three guiding actions toward increased autonomy in robot-assisted surgery.
- Reed, S. et al. A generalist agent. \JournalTitlearXiv preprint arXiv:2205.06175 (2022).
- Brohan, A. et al. Rt-1: Robotics transformer for real-world control at scale. \JournalTitlearXiv preprint arXiv:2212.06817 (2022).
- Brohan, A. et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. \JournalTitlearXiv preprint arXiv:2307.15818 (2023).
- Collaboration, O. X.-E. et al. Open X-Embodiment: Robotic learning datasets and RT-X models. https://robotics-transformer-x.github.io (2023).
- Hu, Y. et al. Toward general-purpose robots via foundation models: A survey and meta-analysis. \JournalTitlearxiv (2023).
- State of the art in surgical robotics: clinical applications and technology challenges. \JournalTitleComputer Aided Surgery 6, 312–328 (2001).
- Open-sourced reinforcement learning environments for surgical robotics. \JournalTitlearXiv preprint arXiv:1903.02090 (2019).
- Deep reinforcement learning for soft, flexible robots: Brief review with impending challenges. \JournalTitleRobotics 8, 4 (2019).
- Datta, S. et al. Reinforcement learning in surgery. \JournalTitleSurgery 170, 329–332 (2021).
- Scheikl, P. M. et al. Lapgym–an open source framework for reinforcement learning in robot-assisted laparoscopic surgery. \JournalTitlearXiv preprint arXiv:2302.09606 (2023).
- La Ganga, M. L. L.b. surgeon uses robot in operation. \JournalTitleLos Angeles Times (1985).
- Seo, H.-J. et al. Comparison of robot-assisted radical prostatectomy and open radical prostatectomy outcomes: a systematic review and meta-analysis. \JournalTitleYonsei medical journal 57, 1165–1177 (2016).
- Trends in the adoption of robotic surgery for common surgical procedures. \JournalTitleJAMA network open 3, e1918911–e1918911 (2020).
- Dhanani, N. H. et al. The evidence behind robot-assisted abdominopelvic surgery: a systematic review. \JournalTitleAnnals of internal medicine 174, 1110–1117 (2021).
- Lotan, Y. Is robotic surgery cost-effective: no. \JournalTitleCurrent opinion in urology 22, 66–69 (2012).
- Shademan, A. et al. Supervised autonomous robotic soft tissue surgery. \JournalTitleScience translational medicine 8, 337ra64–337ra64 (2016).
- Saeidi, H. et al. Autonomous robotic laparoscopic surgery for intestinal anastomosis. \JournalTitleScience robotics 7, eabj2908 (2022).
- Kuntz, A. et al. Autonomous medical needle steering in vivo. \JournalTitleScience Robotics 8, eadf7614 (2023).
- Richter, F. et al. Autonomous robotic suction to clear the surgical field for hemostasis using image-based blood flow detection. \JournalTitleIEEE Robotics and Automation Letters 6, 1383–1390 (2021).
- A brief survey of deep reinforcement learning. \JournalTitlearXiv preprint arXiv:1708.05866 (2017).
- Learning quadrupedal locomotion over challenging terrain. \JournalTitleScience robotics 5, eabc5986 (2020).
- Legged locomotion in challenging terrains using egocentric vision. In Conference on Robot Learning, 403–415 (PMLR, 2023).
- Deep reinforcement learning for the control of robotic manipulation: a focussed mini-review. \JournalTitleRobotics 10, 22 (2021).
- Learning fine-grained bimanual manipulation with low-cost hardware. \JournalTitlearXiv preprint arXiv:2304.13705 (2023).
- Robot autonomy for surgery. In The Encyclopedia of MEDICAL ROBOTICS: Volume 1 Minimally Invasive Surgical Robotics, 281–313 (World Scientific, 2019).
- A study on overfitting in deep reinforcement learning. \JournalTitlearXiv preprint arXiv:1804.06893 (2018).
- Van Den Berg, J. et al. Superhuman performance of surgical tasks by robots using iterative learning from human-guided demonstrations. In 2010 IEEE International Conference on Robotics and Automation, 2074–2081 (IEEE, 2010).
- Hu, Y. et al. Model predictive optimization for imitation learning from demonstrations. \JournalTitleRobotics and Autonomous Systems 163, 104381 (2023).
- Guided reinforcement learning with efficient exploration for task automation of surgical robot. \JournalTitlearXiv preprint arXiv:2302.09772 (2023).
- Osa, T. et al. An algorithmic perspective on imitation learning. \JournalTitleFoundations and Trends in Robotics 7, 1–179 (2018).
- Ibarz, J. et al. How to train your robot with deep reinforcement learning: lessons we have learned. \JournalTitleThe International Journal of Robotics Research 40, 698–721 (2021).
- Octo Model Team et al. Octo: An open-source generalist robot policy. https://octo-models.github.io (2023).
- Bommasani, R. et al. On the opportunities and risks of foundation models. \JournalTitlearXiv preprint arXiv:2108.07258 (2021).
- Moor, M. et al. Foundation models for generalist medical artificial intelligence. \JournalTitleNature 616, 259–265 (2023).
- Touvron, H. et al. Llama 2: Open foundation and fine-tuned chat models. \JournalTitlearXiv preprint arXiv:2307.09288 (2023).
- Vaswani, A. et al. Attention is all you need. \JournalTitleAdvances in neural information processing systems 30 (2017).
- Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. \JournalTitlearXiv preprint arXiv:2010.11929 (2020).
- The rise of robots in surgical environments during covid-19. \JournalTitleNature Machine Intelligence 2, 566–572 (2020).
- A review on the 3d printing of functional structures for medical phantoms and regenerated tissue and organ applications. \JournalTitleEngineering 3, 653–662 (2017).
- Ghazi, A. A call for change. can 3d printing replace cadavers for surgical training? \JournalTitleUrologic Clinics 49, 39–56 (2022).
- Conservative q-learning for offline reinforcement learning. \JournalTitleAdvances in Neural Information Processing Systems 33, 1179–1191 (2020).
- Chebotar, Y. et al. Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions. \JournalTitlearXiv preprint arXiv:2309.10150 (2023).
- A gentle introduction to conformal prediction and distribution-free uncertainty quantification. \JournalTitlearXiv preprint arXiv:2107.07511 (2021).
- Ren, A. Z. et al. Robots that ask for help: Uncertainty alignment for large language model planners. \JournalTitlearXiv preprint arXiv:2307.01928 (2023).
- Cataracts, DOI: 10.21227/ac97-8m18 (2021).
- Schoeffmann, K. et al. Cataract-101: video dataset of 101 cataract surgeries. In Proceedings of the 9th ACM multimedia systems conference, 421–425 (2018).
- Bouget, D. et al. Detecting surgical tools by modelling local appearance and global shape. \JournalTitleIEEE transactions on medical imaging 34, 2603–2617 (2015).
- Twinanda, A. P. et al. Endonet: a deep architecture for recognition tasks on laparoscopic videos. \JournalTitleIEEE transactions on medical imaging 36, 86–97 (2016).
- Hong, W.-Y. et al. Cholecseg8k: a semantic segmentation dataset for laparoscopic cholecystectomy based on cholec80. \JournalTitlearXiv preprint arXiv:2012.12453 (2020).
- Nwoye, C. I. et al. Rendezvous: Attention mechanisms for the recognition of surgical action triplets in endoscopic videos. \JournalTitleMedical Image Analysis 78, 102433 (2022).
- Maier-Hein, L. et al. Heidelberg colorectal data set for surgical data science in the sensor operating room. \JournalTitleScientific data 8, 101 (2021).
- Valderrama, N. et al. Towards holistic surgical scene understanding. In International conference on medical image computing and computer-assisted intervention, 442–452 (Springer, 2022).
- Gao, Y. et al. Jhu-isi gesture and skill assessment working set (jigsaws): A surgical activity dataset for human motion modeling. In MICCAI workshop: M2cai, vol. 3 (2014).
- Madapana, N. et al. Desk: A robotic activity dataset for dexterous surgical skills transfer to medical robots. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 6928–6934 (IEEE, 2019).
- Huaulmé, A. et al. Peg transfer workflow recognition challenge report: Does multi-modal data improve recognition? \JournalTitlearXiv preprint arXiv:2202.05821 (2022).
- A surgical dataset from the da vinci research kit for task automation and recognition. In 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), 1–6 (IEEE, 2023).
- Goodman, E. D. et al. A real-time spatiotemporal ai model analyzes skill in open surgical videos. \JournalTitlearXiv preprint arXiv:2112.07219 (2021).
- Medical big data is not yet available: why we need realism rather than exaggeration. \JournalTitleEndocrinology and Metabolism 34, 349–354 (2019).
- Many researchers were not compliant with their published data sharing statement: a mixed-methods study. \JournalTitleJournal of Clinical Epidemiology 150, 33–41 (2022).
- Hamilton, D. G. et al. Prevalence and predictors of data and code sharing in the medical and health sciences: systematic review with meta-analysis of individual participant data. \JournalTitlebmj 382 (2023).
- Lin, J. et al. Automatic analysis of available source code of top artificial intelligence conference papers. \JournalTitleInternational Journal of Software Engineering and Knowledge Engineering 32, 947–970 (2022).
- Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. \JournalTitleProceedings of the National Academy of Sciences 118, e2016239118 (2021).
- Jumper, J. et al. Highly accurate protein structure prediction with alphafold. \JournalTitleNature 596, 583–589 (2021).
- Towards generalist foundation model for radiology. \JournalTitlearXiv preprint arXiv:2308.02463 (2023).
- Wang, D. et al. Medfmc: A real-world dataset and benchmark for foundation model adaptation in medical image classification. \JournalTitlearXiv preprint arXiv:2306.09579 (2023).
- Hsu, L. G. et al. Nonsurgical factors that influence the outcome of bariatric surgery: a review. \JournalTitlePsychosomatic medicine 60, 338–346 (1998).
- Impact of obesity on surgical outcomes after colorectal resection. \JournalTitleThe American journal of surgery 179, 275–281 (2000).
- Psychosocial factors and surgical outcomes: an evidence-based literature review. \JournalTitleJAAOS-Journal of the American Academy of Orthopaedic Surgeons 14, 397–405 (2006).
- Lam, K. et al. Machine learning for technical skill assessment in surgery: a systematic review. \JournalTitleNPJ digital medicine 5, 24 (2022).
- Evaluation of deep learning models for identifying surgical actions and measuring performance. \JournalTitleJAMA network open 3, e201664–e201664 (2020).
- Haque, T. F. et al. An assessment tool to provide targeted feedback to robotic surgical trainees: development and validation of the end-to-end assessment of suturing expertise (ease). \JournalTitleUrology practice 9, 532–539 (2022).
- Moon, M. R. Early-and late-career surgeon deficiencies in complex cases. \JournalTitleThe Journal of Thoracic and Cardiovascular Surgery 164, 1023–1025 (2022).