Transformers in Medical Imaging: A Survey (2201.09873v1)

Published 24 Jan 2022 in eess.IV and cs.CV

Abstract: Following unprecedented success on the natural language tasks, Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results and prompting researchers to reconsider the supremacy of convolutional neural networks (CNNs) as {de facto} operators. Capitalizing on these advances in computer vision, the medical imaging field has also witnessed growing interest for Transformers that can capture global context compared to CNNs with local receptive fields. Inspired from this transition, in this survey, we attempt to provide a comprehensive review of the applications of Transformers in medical imaging covering various aspects, ranging from recently proposed architectural designs to unsolved issues. Specifically, we survey the use of Transformers in medical image segmentation, detection, classification, reconstruction, synthesis, registration, clinical report generation, and other tasks. In particular, for each of these applications, we develop taxonomy, identify application-specific challenges as well as provide insights to solve them, and highlight recent trends. Further, we provide a critical discussion of the field's current state as a whole, including the identification of key challenges, open problems, and outlining promising future directions. We hope this survey will ignite further interest in the community and provide researchers with an up-to-date reference regarding applications of Transformer models in medical imaging. Finally, to cope with the rapid development in this field, we intend to regularly update the relevant latest papers and their open-source implementations at \url{https://github.com/fahadshamshad/awesome-transformers-in-medical-imaging}.

PDF Abstract

Survey of Transformers in Medical Imaging

The paper "Transformers in Medical Imaging: A Survey" provides a comprehensive review of the application of Transformer models across various medical imaging tasks. Leveraging the global context capabilities of Transformers, the authors explore their use in segmentation, detection, classification, reconstruction, synthesis, and registration, among others. Here, we highlight the core contributions, challenges addressed, and future directions proposed in the paper.

Core Contributions

Transformers have emerged as a powerful tool in machine learning, originally gaining traction in natural language processing. Their ability to model long-range dependencies is particularly advantageous in medical imaging, where capturing the global context can enhance the interpretation of complex anatomical structures.

Segmentation: Transformers are employed for various segmentation tasks, including organ-specific and multi-organ segmentation, effectively capturing global spatial dependencies. Hybrid models combining CNNs and Transformers exhibit strong performance in capturing both local and global information, as demonstrated by approaches like TransUNet.
Classification: In medical image classification, Transformers contribute to tasks such as COVID-19 diagnosis and tumor identification, offering improved performance by utilizing global feature representations.
Detection: Detection tasks, essential for identifying anomalies like tumors or lesions, benefit from Transformers' attention mechanisms, which emphasize relevant image regions.
Reconstruction: Transformers have shown promise in reconstructing high-quality medical images from under-sampled data, particularly in MRI applications. Techniques such as SLATER demonstrate the potential of Transformers in generating high-fidelity images.
Synthesis and Registration: For image synthesis and registration, Transformers assist in generating realistic images and enhancing alignment between different modalities and time points.
Clinical Report Generation: The paper also addresses Transformers as LLMs, aiding in clinical report generation by translating medical images into diagnostic text.

Challenges and Open Problems

Despite the demonstrated efficacy of Transformers, several challenges remain:

Data Limitations: The requirement for large datasets for pre-training poses a challenge in the medical domain where labeled data is scarce. Exploring self-supervised learning and domain-specific pre-training could address these issues.
Model Interpretability: While confidence in black-box models is growing, interpretability remains crucial in medical applications. Developing methods to elucidate the decision-making process of Transformers is essential for clinical adoption.
Efficiency and Scalability: Transformer's computational demands hinder real-time applicability in resource-constrained settings. Designing lightweight and efficient Transformer models is imperative for deployment in clinical environments.

Future Directions

The paper suggests several avenues for future research:

Pre-training Techniques: Domain-specific pre-training strategies and leveraging self-supervised techniques to reduce data label dependence.
Multimodal Learning: Enhancing integration across various imaging modalities to fully exploit complementarity and improve diagnostic accuracy.
Explainable AI: Advancing interpretability tools tailored for Transformers to facilitate trust and understanding among healthcare professionals.
Federated Learning: Enabling collaborative learning across institutions while preserving data privacy through federated approaches can enhance model generalization and robustness.

Conclusion

This survey underscores the significant advancements and potential of Transformer models in transforming medical imaging analysis. While the journey is fraught with challenges, ongoing research and innovative approaches promise to harness the full power of Transformers, offering intelligent solutions to complex medical imaging problems, ultimately contributing to improved patient care and outcomes.