Emergent Mind

General-purpose foundation models for increased autonomy in robot-assisted surgery

Published Jan 1, 2024 in cs.RO , cs.LG , and q-bio.TO


The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific objectives that solve a single robotic problem such as picking up an object or reaching a target position. However, recent work on high-capacity models in robotics has shown promise toward being trained on large collections of diverse and task-agnostic datasets of video demonstrations. These models have shown impressive levels of generalization to unseen circumstances, especially as the amount of data and the model complexity scale. Surgical robot systems that learn from data have struggled to advance as quickly as other fields of robot learning for a few reasons: (1) there is a lack of existing large-scale open-source data to train models, (2) it is challenging to model the soft-body deformations that these robots work with during surgery because simulation cannot match the physical and visual complexity of biological tissue, and (3) surgical robots risk harming patients when tested in clinical trials and require more extensive safety measures. This perspective article aims to provide a path toward increasing robot autonomy in robot-assisted surgery through the development of a multi-modal, multi-task, vision-language-action model for surgical robots. Ultimately, we argue that surgical robots are uniquely positioned to benefit from general-purpose models and provide three guiding actions toward increased autonomy in robot-assisted surgery.


  • Robot learning for RAS faces challenges like soft-body modeling and increased risk; general-purpose models may offer solutions.

  • General-purpose models can learn a broad range of skills using self-supervised learning from diverse datasets, beneficial for RAS.

  • RT architecture, using inputs from language, visual, and sensor data, shows promise in generalization for robotics tasks.

  • Surgical robots are ideal for RTs due to stationary operation, high-performance computing access, and a wealth of training data.

  • Advancements may include conservative Q-learning for risk avoidance and conformal prediction for assessing action confidence.

Overview of Robot-Assisted Surgery Learning

Robot learning, a component of AI specific to robotics, typically concentrates on optimizing robots to complete specialized tasks using techniques like Deep Reinforcement Learning (DRL). However, robot-assisted surgery (RAS) presents a unique set of challenges, such as soft-body modeling and a higher risk of causing harm, that have hindered the application of these approaches. Recent research has indicated that general-purpose models could be key to meeting these challenges, offering a broader range of skills and better generalization to varied tasks.

General-Purpose Models in Robotics

The research discusses how large-scale, high-capacity models, similar to foundation models in NLP, could benefit RAS. These models train on extensive, diverse datasets using self-supervised learning, which fosters robust knowledge and skill bases within AI without the need for human-labeled data. In robotics, this approach has given rise to the robot transformer (RT) architecture, which combines inputs from language, visual cues from cameras, and sensor data to learn from offline task demonstrations. RTs have demonstrated a promising ability to generalize across different tasks and conditions not covered during training.

The Unique Opportunity for Surgical Robots

Surgical robots are well-suited for integrating RTs due to their stationary operation, which alleviates concerns about computation time and energy efficiency faced by mobile robots. Given that surgical robots don't rely on battery power and can interface with high-performance computing, they can work with much more computationally demanding models. Moreover, the abundance of surgical procedures recorded daily provides a rich, untapped source of training data. However, three major challenges are identified: developing risk-avoidant behaviors, unifying medical data across institutions, and enhancing safety beyond current demonstration data quality.

Path Forward and Implications

To tackle these challenges, the paper suggests a combination of conservative Q-learning to predict and avoid high-risk situations, and conformal prediction to gauge the robot's confidence in its actions, potentially handing off control to a human surgeon in uncertain scenarios. Merging medical data across institutions and adding layers of safety assessment tailored to surgical quality could allow surgical robots to surpass human performance standards.

The authors envision an RT-RAS system (RT model for robot-assisted surgery) that uses real-world data for ongoing improvement and enhanced autonomous capabilities. This could lead to more consistent surgeries and reduced costs. Additionally, such autonomous systems could revolutionize surgical training and safety, by providing immediate expert feedback and safety measures for training surgeons. This necessitates a collaborative effort among academia, healthcare institutions, and industry to realize the benefits of general-purpose models in robot-assisted surgery.

Get summaries of trending AI papers delivered straight to your inbox

Unsubscribe anytime.

  1. A Generalist Agent
  2. RT-1: Robotics Transformer for Real-World Control at Scale
  3. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
  4. Collaboration, O. X.-E. et al. Open X-Embodiment: Robotic learning datasets and RT-X models. https://robotics-transformer-x.github.io (2023).

  5. Hu, Y. et al. Toward general-purpose robots via foundation models: A survey and meta-analysis. \JournalTitlearxiv (2023).
  6. State of the art in surgical robotics: clinical applications and technology challenges. \JournalTitleComputer Aided Surgery 6, 312–328 (2001).
  7. Open-Sourced Reinforcement Learning Environments for Surgical Robotics
  8. Deep reinforcement learning for soft, flexible robots: Brief review with impending challenges. \JournalTitleRobotics 8, 4 (2019).
  9. Datta, S. et al. Reinforcement learning in surgery. \JournalTitleSurgery 170, 329–332 (2021).
  10. LapGym -- An Open Source Framework for Reinforcement Learning in Robot-Assisted Laparoscopic Surgery
  11. La Ganga, M. L. L.b. surgeon uses robot in operation. \JournalTitleLos Angeles Times (1985).
  12. Seo, H.-J. et al. Comparison of robot-assisted radical prostatectomy and open radical prostatectomy outcomes: a systematic review and meta-analysis. \JournalTitleYonsei medical journal 57, 1165–1177 (2016).
  13. Trends in the adoption of robotic surgery for common surgical procedures. \JournalTitleJAMA network open 3, e1918911–e1918911 (2020).
  14. Dhanani, N. H. et al. The evidence behind robot-assisted abdominopelvic surgery: a systematic review. \JournalTitleAnnals of internal medicine 174, 1110–1117 (2021).
  15. Lotan, Y. Is robotic surgery cost-effective: no. \JournalTitleCurrent opinion in urology 22, 66–69 (2012).
  16. Shademan, A. et al. Supervised autonomous robotic soft tissue surgery. \JournalTitleScience translational medicine 8, 337ra64–337ra64 (2016).
  17. Saeidi, H. et al. Autonomous robotic laparoscopic surgery for intestinal anastomosis. \JournalTitleScience robotics 7, eabj2908 (2022).
  18. Kuntz, A. et al. Autonomous medical needle steering in vivo. \JournalTitleScience Robotics 8, eadf7614 (2023).
  19. Richter, F. et al. Autonomous robotic suction to clear the surgical field for hemostasis using image-based blood flow detection. \JournalTitleIEEE Robotics and Automation Letters 6, 1383–1390 (2021).
  20. A Brief Survey of Deep Reinforcement Learning
  21. Learning quadrupedal locomotion over challenging terrain. \JournalTitleScience robotics 5, eabc5986 (2020).
  22. Legged locomotion in challenging terrains using egocentric vision. In Conference on Robot Learning, 403–415 (PMLR, 2023).
  23. Deep reinforcement learning for the control of robotic manipulation: a focussed mini-review. \JournalTitleRobotics 10, 22 (2021).
  24. Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
  25. Robot autonomy for surgery. In The Encyclopedia of MEDICAL ROBOTICS: Volume 1 Minimally Invasive Surgical Robotics, 281–313 (World Scientific, 2019).
  26. A Study on Overfitting in Deep Reinforcement Learning
  27. Van Den Berg, J. et al. Superhuman performance of surgical tasks by robots using iterative learning from human-guided demonstrations. In 2010 IEEE International Conference on Robotics and Automation, 2074–2081 (IEEE, 2010).
  28. Hu, Y. et al. Model predictive optimization for imitation learning from demonstrations. \JournalTitleRobotics and Autonomous Systems 163, 104381 (2023).
  29. Demonstration-Guided Reinforcement Learning with Efficient Exploration for Task Automation of Surgical Robot
  30. Osa, T. et al. An algorithmic perspective on imitation learning. \JournalTitleFoundations and Trends in Robotics 7, 1–179 (2018).
  31. Ibarz, J. et al. How to train your robot with deep reinforcement learning: lessons we have learned. \JournalTitleThe International Journal of Robotics Research 40, 698–721 (2021).
  32. Octo Model Team et al. Octo: An open-source generalist robot policy. https://octo-models.github.io (2023).

  33. On the Opportunities and Risks of Foundation Models
  34. Moor, M. et al. Foundation models for generalist medical artificial intelligence. \JournalTitleNature 616, 259–265 (2023).
  35. Llama 2: Open Foundation and Fine-Tuned Chat Models
  36. Vaswani, A. et al. Attention is all you need. \JournalTitleAdvances in neural information processing systems 30 (2017).
  37. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
  38. The rise of robots in surgical environments during covid-19. \JournalTitleNature Machine Intelligence 2, 566–572 (2020).
  39. A review on the 3d printing of functional structures for medical phantoms and regenerated tissue and organ applications. \JournalTitleEngineering 3, 653–662 (2017).
  40. Ghazi, A. A call for change. can 3d printing replace cadavers for surgical training? \JournalTitleUrologic Clinics 49, 39–56 (2022).
  41. Conservative q-learning for offline reinforcement learning. \JournalTitleAdvances in Neural Information Processing Systems 33, 1179–1191 (2020).
  42. Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
  43. A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification
  44. Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
  45. Cataracts, DOI: 10.21227/ac97-8m18 (2021).
  46. Schoeffmann, K. et al. Cataract-101: video dataset of 101 cataract surgeries. In Proceedings of the 9th ACM multimedia systems conference, 421–425 (2018).
  47. Bouget, D. et al. Detecting surgical tools by modelling local appearance and global shape. \JournalTitleIEEE transactions on medical imaging 34, 2603–2617 (2015).
  48. Twinanda, A. P. et al. Endonet: a deep architecture for recognition tasks on laparoscopic videos. \JournalTitleIEEE transactions on medical imaging 36, 86–97 (2016).
  49. CholecSeg8k: A Semantic Segmentation Dataset for Laparoscopic Cholecystectomy Based on Cholec80
  50. Nwoye, C. I. et al. Rendezvous: Attention mechanisms for the recognition of surgical action triplets in endoscopic videos. \JournalTitleMedical Image Analysis 78, 102433 (2022).
  51. Maier-Hein, L. et al. Heidelberg colorectal data set for surgical data science in the sensor operating room. \JournalTitleScientific data 8, 101 (2021).
  52. Valderrama, N. et al. Towards holistic surgical scene understanding. In International conference on medical image computing and computer-assisted intervention, 442–452 (Springer, 2022).
  53. Gao, Y. et al. Jhu-isi gesture and skill assessment working set (jigsaws): A surgical activity dataset for human motion modeling. In MICCAI workshop: M2cai, vol. 3 (2014).
  54. Madapana, N. et al. Desk: A robotic activity dataset for dexterous surgical skills transfer to medical robots. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 6928–6934 (IEEE, 2019).
  55. PEg TRAnsfer Workflow recognition challenge report: Does multi-modal data improve recognition?
  56. A surgical dataset from the da vinci research kit for task automation and recognition. In 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), 1–6 (IEEE, 2023).
  57. A real-time spatiotemporal AI model analyzes skill in open surgical videos
  58. Medical big data is not yet available: why we need realism rather than exaggeration. \JournalTitleEndocrinology and Metabolism 34, 349–354 (2019).
  59. Many researchers were not compliant with their published data sharing statement: a mixed-methods study. \JournalTitleJournal of Clinical Epidemiology 150, 33–41 (2022).
  60. Hamilton, D. G. et al. Prevalence and predictors of data and code sharing in the medical and health sciences: systematic review with meta-analysis of individual participant data. \JournalTitlebmj 382 (2023).
  61. Lin, J. et al. Automatic analysis of available source code of top artificial intelligence conference papers. \JournalTitleInternational Journal of Software Engineering and Knowledge Engineering 32, 947–970 (2022).
  62. Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. \JournalTitleProceedings of the National Academy of Sciences 118, e2016239118 (2021).
  63. Jumper, J. et al. Highly accurate protein structure prediction with alphafold. \JournalTitleNature 596, 583–589 (2021).
  64. Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data
  65. MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification
  66. Hsu, L. G. et al. Nonsurgical factors that influence the outcome of bariatric surgery: a review. \JournalTitlePsychosomatic medicine 60, 338–346 (1998).
  67. Impact of obesity on surgical outcomes after colorectal resection. \JournalTitleThe American journal of surgery 179, 275–281 (2000).
  68. Psychosocial factors and surgical outcomes: an evidence-based literature review. \JournalTitleJAAOS-Journal of the American Academy of Orthopaedic Surgeons 14, 397–405 (2006).
  69. Lam, K. et al. Machine learning for technical skill assessment in surgery: a systematic review. \JournalTitleNPJ digital medicine 5, 24 (2022).
  70. Evaluation of deep learning models for identifying surgical actions and measuring performance. \JournalTitleJAMA network open 3, e201664–e201664 (2020).
  71. Haque, T. F. et al. An assessment tool to provide targeted feedback to robotic surgical trainees: development and validation of the end-to-end assessment of suturing expertise (ease). \JournalTitleUrology practice 9, 532–539 (2022).
  72. Moon, M. R. Early-and late-career surgeon deficiencies in complex cases. \JournalTitleThe Journal of Thoracic and Cardiovascular Surgery 164, 1023–1025 (2022).

Show All 72