USCL: Pretraining Deep Ultrasound Image Diagnosis Model through Video Contrastive Representation Learning (2011.13066v2)

Published 25 Nov 2020 in cs.CV and cs.AI

Abstract: Most deep neural networks (DNNs) based ultrasound (US) medical image analysis models use pretrained backbones (e.g., ImageNet) for better model generalization. However, the domain gap between natural and medical images causes an inevitable performance bottleneck. To alleviate this problem, an US dataset named US-4 is constructed for direct pretraining on the same domain. It contains over 23,000 images from four US video sub-datasets. To learn robust features from US-4, we propose an US semi-supervised contrastive learning method, named USCL, for pretraining. In order to avoid high similarities between negative pairs as well as mine abundant visual features from limited US videos, USCL adopts a sample pair generation method to enrich the feature involved in a single step of contrastive optimization. Extensive experiments on several downstream tasks show the superiority of USCL pretraining against ImageNet pretraining and other state-of-the-art (SOTA) pretraining approaches. In particular, USCL pretrained backbone achieves fine-tuning accuracy of over 94% on POCUS dataset, which is 10% higher than 84% of the ImageNet pretrained model. The source codes of this work are available at https://github.com/983632847/USCL.

Citations (39)

View on Semantic Scholar

Summary

The paper introduces USCL, a framework using video contrastive representation learning on a new US-4 dataset to pretrain deep learning models for ultrasound image diagnosis.
The USCL method employs a novel contrastive learning framework with sample pair generation to effectively capture semantic features from ultrasound videos for pretraining.
Empirical results show USCL-pretrained models achieve significant performance gains, including a remarkable 10% improvement in fine-tuning accuracy on downstream diagnostic tasks.

Pretraining Deep Ultrasound Image Diagnosis Models via Contrastive Learning: A Quantitative Approach

The paper "USCL: Pretraining Deep Ultrasound Image Diagnosis Model through Video Contrastive Representation Learning" discusses a novel approach to addressing the domain adaptation challenges inherent when applying deep neural networks (DNNs) to ultrasound (US) imaging data. Traditionally, DNN models for US image analysis have relied on pretrained datasets like ImageNet, primarily composed of natural images, posing a domain gap issue that often limits performance on medical images. The authors propose a US-specific pretraining strategy leveraging contrastive learning techniques, specifically designed to improve the outcomes of medical imaging applications through more relevant contextual training data.

Development of US-4 Ultrasound Dataset

The cornerstone of this paper is the construction of the US-4 dataset, crafted to mitigate data scarcity challenges associated with ultrasound imaging analysis. This dataset encompasses over 23,000 images segmented into four sub-datasets. It spans diverse organ targets, imaging depths, and resolutions, reflecting the practical variations encountered within clinical settings. By sampling images from videos at consistent intervals, US-4 aims to capture comprehensive semantic features suitable for robust model training. These semantic clusters derived from US-4 facilitate a more cohesive learning process when building diagnostic models.

Methodology and Contrastive Learning Framework

Central to the authors' proposal is the Ultrasound Contrastive Representation Learning (USCL) framework, which pretrains models using ultrasound images directly. The USCL method employs semi-supervised learning where contrastive loss functions supplement unsupervised representation learning, enriched by sample pair generation (SPG) schemes that mitigate traditional negative-pair similarity conflicts in ultrasound data. Here, contrastive learning achieves a denser feature aggregation by assigning positive pairs from clusters within the same video and negative pairs across distinct videos. This design maximizes intraclass feature fidelity while maintaining robust intercluster discrimination, an advancement over prevailing contrastive paradigms.

Results and Implications

The empirical results, shown through extensive experiments on the POCUS and UDIAT-B datasets, underline the efficacy of USCL models which outstripped traditional ImageNet-attributed methods significantly. More importantly, when applied to downstream applications, such as disease classification tasks, USCL-pretrained backbones showcased a remarkable 10% improvement in fine-tuning accuracy on the POCUS dataset compared to ImageNet-pretrained counterparts. This evidences the potential of domain-specific contextual learning to enhance diagnostic protocols.

Future Directions

Although the results are promising, there remains a need for extending US-4's scope to encapsulate more varied and extensive clinical scenarios, consequently broadening the potential application of finely-tuned models in real-world practice. Future research might consider refining this contrastive learning approach by integrating generative adversarial training or leveraging multi-modal imaging datasets to further boost clinical diagnostic accuracy. Moreover, scalability and computational efficiency remain pivotal challenges that such approaches need to address as they transition from theoretical constructs to everyday medical utility.

In conclusion, the paper presents a well-grounded framework essential for elevating the precision of ultrasound-integrated deep learning applications, holding implications not only for enhanced clinical diagnoses but also signaling a shift towards more application-specific pretraining strategies within medical imaging paradigms.

Related Papers

GitHub

GitHub - 983632847/USCL: This repository includes the constructed US-4 dataset and the code for USCL (PyTorch version). (79 stars)