Focused Decoding Enables 3D Anatomical Detection by Transformers (2207.10774v4)
Abstract: Detection Transformers represent end-to-end object detection approaches based on a Transformer encoder-decoder architecture, exploiting the attention mechanism for global relation modeling. Although Detection Transformers deliver results on par with or even superior to their highly optimized CNN-based counterparts operating on 2D natural images, their success is closely coupled to access to a vast amount of training data. This, however, restricts the feasibility of employing Detection Transformers in the medical domain, as access to annotated data is typically limited. To tackle this issue and facilitate the advent of medical Detection Transformers, we propose a novel Detection Transformer for 3D anatomical structure detection, dubbed Focused Decoder. Focused Decoder leverages information from an anatomical region atlas to simultaneously deploy query anchors and restrict the cross-attention's field of view to regions of interest, which allows for a precise focus on relevant anatomical structures. We evaluate our proposed approach on two publicly available CT datasets and demonstrate that Focused Decoder not only provides strong detection results and thus alleviates the need for a vast amount of annotated data but also exhibits exceptional and highly intuitive explainability of results via attention weights. Our code is available at https://github.com/bwittmann/transoar.
- nnDetection: A self-configuring method for medical object detection. In Proc. MICCAI, pages 530–539, 2021.
- End-to-end object detection with transformers. In Proc. ECCV, pages 213–229, 2020.
- Hybrid task cascade for instance segmentation. In Proc. IEEE/CVF CVPR, pages 4974–4983, 2019.
- Decision forests with long-range spatial context for organ localization in CT volumes. In Proc. MICCAI, pages 69–80, 2009.
- Regression forests for efficient anatomy detection and localization in CT studies. In Proc. MCV Workshop, pages 106–117, 2010.
- Deformable convolutional networks. In Proc. IEEE ICCV, pages 764–773, 2017.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Multi-organ localization with cascaded global-to-local regression and shape prior. MedIA, 23(1):70–83, 2015.
- Are we ready for autonomous driving? the KITTI vision benchmark suite. In Proc. IEEE/CVF CVPR, pages 3354–3361, 2012.
- Swin UNETR: Swin transformers for semantic segmentation of brain tumors in mri images. arXiv preprint arXiv:2201.01266, 2022a.
- Ali Hatamizadeh et al. UNETR: Transformers for 3d medical image segmentation. In Proc. IEEE/CVF WACV, pages 574–584, 2022b.
- A volume-based anatomical atlas. IEEE CGA, 12(04):73–77, 1992.
- nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2):203–211, 2021.
- Retina U-Net: embarrassingly simple exploitation of segmentation supervision for medical object detection. In Proc ML4H, pages 171–183. PMLR, 2020.
- AMOS: a large-scale abdominal multi-organ benchmark for versatile medical image segmentation. arXiv preprint arXiv:2206.08023, 2022.
- Cloud-based evaluation of anatomical structure segmentation and landmark detection algorithms: VISCERAL anatomy benchmarks. IEEE TMI, 35(11):2459–2475, 2016.
- Dn-DETR: Accelerate DETR training by introducing query denoising. In Proc. IEEE/CVF CVPR, pages 13619–13627, 2022.
- Deep-learning-based detection and segmentation of organs at risk in nasopharyngeal carcinoma computed tomographic images for radiotherapy planning. European radiology, 29(4):1961–1967, 2019.
- Microsoft COCO: common objects in context. In Proc. ECCV, pages 740–755, 2014.
- Focal loss for dense object detection. In Proc. IEEE ICCV, pages 2980–2988, 2017.
- DAB-DETR: Dynamic anchor boxes are better queries for DETR. arXiv preprint arXiv:2201.12329, 2022a.
- Video Swin transformer. In Proc. IEEE/CVF CVPR, pages 3202–3211, 2022b.
- Organ detection in thorax abdomen CT using multi-label convolutional neural networks. In Proc Medical Imaging: Computer-Aided Diagnosis, pages 287–292, 2017.
- Conditional DETR for fast training convergence. In Proc. IEEE/CVF CVPR, pages 3651–3660, 2021.
- Shape-aware complementary-task learning for multi-organ segmentation. In Proc. MLMI, pages 620–627, 2019.
- Evaluating the robustness of self-supervised learning in medical imaging. arXiv preprint arXiv:2105.06986, 2021.
- A unified 3D framework for organs at risk localization and segmentation for radiation therapy planning. arXiv preprint arXiv:2203.00624, 2022.
- Learning new parts for landmark localization in whole-body CT scans. IEEE TMI, 33(4):836–848, 2013.
- Attention-based transformers for instance segmentation of cells in microstructures. In Proc. IEEE BIBM, pages 700–707, 2020.
- Faster r-cnn: Towards real-time object detection with region proposal networks. In Proc. NeurIPS, pages 91–99, 2015.
- Light random regression forests for automatic multi-organ localization in CT images. In Proc IEEE ISBI, pages 371–374, 2017.
- Deep learning-enabled multi-organ segmentation in whole-body mouse scans. Nature communications, 11(1):1–14, 2020.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proc. IEEE ICCV, pages 618–626, 2017.
- Objects365: A large-scale, high-quality dataset for object detection. In Proc. IEEE/CVF CVPR, pages 8430–8439, 2019.
- COTR: convolution in transformer network for end to end polyp detection. In Proc. INFOCOM, pages 1757–1761, 2021.
- Relationformer: A unified framework for image-to-graph generation. arXiv preprint arXiv:2203.10202, 2022.
- Spine-transformers: Vertebra labeling and segmentation in arbitrary field-of-view spine CTs via 3D transformers. MedIA, 75:102258, 2022.
- Disease quantification on PET/CT images without explicit object delineation. MedIA, 51:169–183, 2019.
- Attention is all you need. In Proc. NeurIPS, pages 5998–6008, 2017.
- SwinFPN: Leveraging vision transformers for 3d organs-at-risk detection. In Proc. MIDL, 2022.
- Efficient multiple organ localization in CT image using 3D region proposal network. IEEE TMI, 38(8):1885–1898, 2019.
- Evaluation of six registration methods for the human abdomen on clinically acquired CT. IEEE TBME, 63(8):1563–1572, 2016.
- Efficient DETR: improving end-to-end object detector with dense prior. arXiv preprint arXiv:2104.01318, 2021.
- Dino: DETR with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605, 2022.
- Deformable DETR: Deformable transformers for end-to-end object detection. In Proc. ICLR, 2021.