Navigating the Synthetic Realm: Harnessing Diffusion-based Models for Laparoscopic Text-to-Image Generation (2312.03043v1)
Abstract: Recent advances in synthetic imaging open up opportunities for obtaining additional data in the field of surgical imaging. This data can provide reliable supplements supporting surgical applications and decision-making through computer vision. Particularly the field of image-guided surgery, such as laparoscopic and robotic-assisted surgery, benefits strongly from synthetic image datasets and virtual surgical training methods. Our study presents an intuitive approach for generating synthetic laparoscopic images from short text prompts using diffusion-based generative models. We demonstrate the usage of state-of-the-art text-to-image architectures in the context of laparoscopic imaging with regard to the surgical removal of the gallbladder as an example. Results on fidelity and diversity demonstrate that diffusion-based models can acquire knowledge about the style and semantics in the field of image-guided surgery. A validation study with a human assessment survey underlines the realistic nature of our synthetic data, as medical personnel detects actual images in a pool with generated images causing a false-positive rate of 66%. In addition, the investigation of a state-of-the-art machine learning model to recognize surgical actions indicates enhanced results when trained with additional generated images of up to 5.20%. Overall, the achieved image quality contributes to the usage of computer-generated images in surgical applications and enhances its path to maturity.
- Computer vision in surgery: from potential to clinical value. npj Digital Medicine, 5(1):1–9, 2022.
- Video is better: why aren’t we using it? A mixed-methods study of the barriers to routine procedural video recording and case review. Surgical Endoscopy, 36(2):1090–1097, 2022.
- Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains. In Surgical Innovation NeurIPS Workshops, 2023.
- Satoshi Kondo. LapFormer: surgical tool detection in laparoscopic surgical video using transformer architecture. Comput. Methods. Biomech. Biomed. Eng. Imaging Vis., 9(3):302–307, 2021.
- Virtual reality training compared with apprenticeship training in laparoscopic surgery: a meta-analysis. Ann. R. Coll. Surg. Engl., 102(9):672–684, 2020.
- The application of virtual reality in the training of laparoscopic surgery: A systematic review and meta-analysis. IJS, 87:105859, 2021.
- Generating Large Labeled Data Sets for Laparoscopic Image Processing Tasks Using Unpaired Image-to-Image Translation. In Dinggang Shen, Tianming Liu, Terry M. Peters, Lawrence H. Staib, Caroline Essert, Sean Zhou, Pew-Thian Yap, and Ali Khan, editors, MICCAI, Cham, 2019.
- Visual chatgpt: Talking, drawing and editing with visual foundation models, 2023.
- Mutually improved endoscopic image synthesis and landmark detection in unpaired image-to-image translation. JBHI, 26(1):127–138, 2022.
- Towards realistic laparoscopic image generation using image-domain translation. Comput. Methods Programs Biomed., 200:105834, 2021.
- Robotic Instrument Segmentation With Image-to-Image Translation. IEEE RA-L, 6(2):935–942, 2021.
- Brain Imaging Generation with Latent Diffusion Models. In Anirban Mukhopadhyay, Ilkay Oksuz, Sandy Engelhardt, Dajiang Zhu, and Yixuan Yuan, editors, DGM4MICCAI, Cham, 2022.
- Spot the Fake Lungs: Generating Synthetic Medical Images Using Neural Diffusion Models. In Luca Longo and Ruairi O’Reilly, editors, AICS, 2023.
- Improving dermatology classifiers across populations using images generated by large diffusion models. In NeurIPS Workshops, 2023.
- Elucidating the Design Space of Diffusion-Based Generative Models. In NeurIPS, 2022.
- Hierarchical Text-Conditional Image Generation with CLIP Latents, 2022.
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In NeurIPS, 2022.
- Diffusion Models Beat GANs on Image Synthesis. In NeurIPS 2021, 2021.
- GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NeurIPS, 2017.
- A Study on the Evaluation of Generative Models, 2022.
- On Aliased Resizing and Surprising Subtleties in GAN Evaluation. In CVPR, 2022.
- Learning Transferable Visual Models From Natural Language Supervision. In ICML, 2021.
- Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR, 21(1):140:5485–140:5551, 2020.
- Perception Prioritized Training of Diffusion Models. In CVPRW, 2022.
- Rendezvous: Attention mechanisms for the recognition of surgical action triplets in endoscopic videos. Medical Image Analysis, 78:102433, 2022.
- EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos. TMI, 36(1):86–97, 2017.
- CholecSeg8k: A Semantic Segmentation Dataset for Laparoscopic Cholecystectomy Based on Cholec80, 2020.
- Reliable Fidelity and Diversity Metrics for Generative Models. In ICML, 2020.
- Demystifying MMD GANs. In ICLR, 2023.
- Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet Datasets, 2023.
- Extracting training data from diffusion models. In 32nd USENIX Security Symposium (USENIX Security 23), 2023.
- The role of explainability in creating trustworthy artificial intelligence for health care: A comprehensive survey of the terminology, design choices, and evaluation strategies. JBI, 113:103655, 2021.
- The medical algorithmic audit. The Lancet Digital Health, 4(5):e384–e397, 2022.
- Video Diffusion Models, 2022.
- Simeon Allmendinger (7 papers)
- Patrick Hemmer (19 papers)
- Moritz Queisner (2 papers)
- Igor Sauer (3 papers)
- Leopold Müller (7 papers)
- Johannes Jakubik (24 papers)
- Michael Vössing (23 papers)
- Niklas Kühl (94 papers)