YOLO-MED : Multi-Task Interaction Network for Biomedical Images (2403.00245v1)
Abstract: Object detection and semantic segmentation are pivotal components in biomedical image analysis. Current single-task networks exhibit promising outcomes in both detection and segmentation tasks. Multi-task networks have gained prominence due to their capability to simultaneously tackle segmentation and detection tasks, while also accelerating the segmentation inference. Nevertheless, recent multi-task networks confront distinct limitations such as the difficulty in striking a balance between accuracy and inference speed. Additionally, they often overlook the integration of cross-scale features, which is especially important for biomedical image analysis. In this study, we propose an efficient end-to-end multi-task network capable of concurrently performing object detection and semantic segmentation called YOLO-Med. Our model employs a backbone and a neck for multi-scale feature extraction, complemented by the inclusion of two task-specific decoders. A cross-scale task-interaction module is employed in order to facilitate information fusion between various tasks. Our model exhibits promising results in balancing accuracy and speed when evaluated on the Kvasir-seg dataset and a private biomedical image dataset.
- “The medical segmentation decathlon,” Nature communications, vol. 13, no. 1, pp. 4128, 2022.
- “Deep learning-enabled medical computer vision,” NPJ digital medicine, vol. 4, no. 1, pp. 5, 2021.
- “Study of deep learning techniques for medical image analysis: A review,” Materials Today: Proceedings, vol. 56, pp. 209–214, 2022.
- “Ai in health and medicine,” Nature medicine, vol. 28, no. 1, pp. 31–38, 2022.
- “Machine learning in medical applications: A review of state-of-the-art methods,” Computers in Biology and Medicine, vol. 145, 2022.
- “A review on deep learning in medical image analysis,” International Journal of Multimedia Information Retrieval, vol. 11, no. 1, pp. 19–38, 2022.
- “You only look once: Unified, real-time object detection,” in CVPR, 2016.
- “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint, 2020.
- “Retina u-net: Embarrassingly simple exploitation of segmentation supervision for medical object detection,” in Machine Learning for Health Workshop, 2020, pp. 171–183.
- “Polyp-pvt: Polyp segmentation with pyramid vision transformers,” arXiv preprint, 2021.
- “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI, 2015, pp. 234–241.
- “Pranet: Parallel reverse attention network for polyp segmentation,” in MICCAI, 2020, pp. 263–273.
- “Progressively normalized self-attention network for video polyp segmentation,” in MICCAI, 2021.
- “Cross-level feature aggregation network for polyp segmentation,” Pattern Recognition, vol. 140, pp. 109555, 2023.
- “Diagnosis and segmentation effect of the me-nbi-based deep learning model on gastric neoplasms in patients with suspected superficial lesions-a multicenter study,” Frontiers in Oncology, vol. 12, 2023.
- “Uolo-automatic object detection and segmentation in biomedical images,” in MICCAI Workshop, 2018, pp. 165–173.
- “Mulan: multitask universal lesion analysis network for joint lesion detection, tagging, and segmentation,” in MICCAI, 2019, pp. 194–202.
- “Demt: Deformable mixer transformer for multi-task learning of dense prediction,” in AAAI, 2023.
- “Scale-aware task message transferring for multi-task learning,” in ICME, 2023, pp. 1859–1864.
- “Prompt guided transformer for multi-task dense prediction,” arXiv preprint arXiv:2307.15362, 2023.
- “Kvasir-seg: A segmented polyp dataset,” in MMM, 2020, pp. 451–462.
- “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE TPAMI, vol. 37, no. 9, pp. 1904–1916, 2015.
- “Feature pyramid networks for object detection,” in CVPR, 2017.
- “Path aggregation network for instance segmentation,” in CVPR, 2018, pp. 8759–8768.
- “Yolox: Exceeding yolo series in 2021,” arXiv preprint, 2021.
- “Rethinking the faster r-cnn architecture for temporal action localization,” in CVPR, 2018.
- “Faster r-cnn: Towards real-time object detection with region proposal networks,” NeurIPS, vol. 28, 2015.
- “An image is worth 16x16 words: Transformers for image recognition at scale,” in ICLR, 2021.
- “Focal loss for dense object detection,” TPAMI, 2018.
- “Distance-iou loss: Faster and better learning for bounding box regression,” in AAAI, 2020.