US-4 Ultrasound Dataset Overview
- US-4 Ultrasound Dataset is a large-scale, video-based repository that enables robust pretraining for ultrasound image analysis by addressing the domain gap from natural images.
- It comprises 23,231 images from 1,051 video studies across diverse sub-datasets (Butterfly, CLUST, Liver Fibrosis, COVID19-LUSMS) with standardized disease and severity annotations.
- Empirical results demonstrate that models pretrained with US-4 and the USCL framework, which employs contrastive learning and mixup augmentation, significantly outperform traditional ImageNet pretrained models.
The US-4 ultrasound dataset is a large-scale video-derived collection specifically designed for domain-specific pretraining of deep neural networks in ultrasound (US) image analysis tasks. Developed and first described in the context of the USCL (Ultrasound Contrastive Learning) framework, US-4 targets the performance bottlenecks induced by the domain gap between natural image pretraining (e.g., ImageNet) and the specialized requirements of US imaging. US-4 supplies a robust resource for feature learning in both supervised and semi-supervised paradigms, emphasizing effective contrastive representation learning with video-informed sampling and annotation.
1. Dataset Composition and Construction
US-4 is composed of 1,051 US video studies, yielding a total of 23,231 images sampled uniformly from these videos. The dataset integrates four sub-datasets, two of which (Butterfly and CLUST) are sourced from public US imaging archives, while the other two (Liver Fibrosis and COVID19-LUSMS) are locally collected. The data emphasizes two primary anatomical regions: the lung and the liver.
| Sub-dataset | Organ | Image Size | Frame Rate | Classes | Videos | Images |
|---|---|---|---|---|---|---|
| Butterfly | Lung | 658×738 | 23 Hz | 2 | 22 | 1533 |
| CLUST | Liver | 434×530 | 19 Hz | 5 | 63 | 3150 |
| Liver Fibrosis | Liver | 600×807 | 28 Hz | 5 | 296 | 11714 |
| COVID19-LUSMS | Lung | 747×747 | 17 Hz | 4 | 670 | 6834 |
| Total | 1051 | 23,231 |
Images are sampled at a regular interval of 3 frames per second per video (), with the interval defined as for a given video with frame rate . This strategy balances semantic consistency within a video with the need to avoid redundancy, yielding clusters that capture temporally adjacent yet sufficiently variable anatomical and pathological content.
2. Annotation Protocol and Content
All images in US-4 are annotated with classification labels appropriate for clinical ultrasound tasks. These include categories for disease status (e.g., COVID-19, pneumonia, healthy) in lung images and fibrosis severity grades in liver images. The annotation protocol is standardized across the sub-datasets to enable multi-class and multi-instance classification; however, pixel-level segmentation masks are not provided.
Images are acquired mainly using convex probes; for downstream evaluation, linear probe images from the UDIAT-B dataset are also referenced. Video capture leverages system-specific configurations (e.g., Resona 7T US with frequency FH 5.0, pixel size 0.101–0.127mm) for Liver Fibrosis and COVID19-LUSMS.
3. Pretraining and the USCL Framework
US-4 is architected for use in pretraining convolutional and transformer-based neural networks with the USCL semi-supervised contrastive learning method. The primary goal is to learn representations that are robust to imaging artifacts, speckle noise, and anatomical variability ubiquitous in ultrasound. The pretraining process leverages semantic clustering at the video level, using spatially and temporally correlated frames as positives in the contrastive objective.
Core sampling and augmentation strategy:
- For each video , frames are extracted, with images randomly subsampled for batch contrastive computation.
- Positive Pair Generation (PPG) employs a mixup operator with Beta-distributed coefficients :
Loss formulation:
- The USCL framework optimizes a compound loss:
where is the standard NT-Xent contrastive loss, is the cross-entropy for supervision, and is a scaling factor ($0.2$ in reported experiments).
Augmentation employs random cropping, horizontal flip, rotation, and color jittering, contributing to improved invariance and generalization.
4. Downstream Performance and Benchmarking
Pretrained backbones (ResNet18) utilizing US-4 with the USCL method demonstrate a clear advantage over both ImageNet and popular self-supervised methods (MoCo v2, SimCLR) across multiple downstream clinical tasks:
| Method | POCUS Accuracy (%) | UDIAT-B Det. AP | UDIAT-B Segm. AP |
|---|---|---|---|
| ImageNet | 84.2 | 40.6 | 48.2 |
| US-4 supervised | 85.0 | 38.3 | 42.6 |
| MoCo v2 (self-sup) | 84.8 | 38.7 | 47.1 |
| SimCLR (self-sup) | 86.4 | 43.8 | 51.3 |
| USCL (US-4) | 94.2 | 45.4 | 52.8 |
On the POCUS lung dataset, USCL-pretrained models exceed ImageNet pretraining by more than 10% absolute accuracy (94.2% vs. 84.2%). F1 score improvements are similarly substantial (94.0% vs. 81.8%). Results on breast US detection/segmentation (UDIAT-B) also favor US-4 pretraining (+4.8% detection AP, +4.6% segmentation AP versus ImageNet).
Ablation studies highlight the necessity of both pair assignment and mixup in USCL; addition of the semi-supervised classification branch further enhances representational quality.
5. Access, Licensing, and Constituency
US-4's codebase and framework implementation are available via https://github.com/983632847/USCL. There is no explicit claim of full public release of the assembled video data; only the codebase is confirmed as open. Two sub-datasets, Butterfly and CLUST, are publicly accessible, while Liver Fibrosis and COVID19-LUSMS are locally collected and may require direct communication for access. This composite structure implies that, while the code and methodology are transferable, assembling the full dataset as described is non-trivial without additional data-sharing agreements.
6. Technical and Comparative Features
The US-4 dataset is distinguished by:
- Moderate scale and video-based origin: 23k images from >1k videos, versus patch-based or static-image datasets.
- Focus on classification-ready annotation (i.e., class labels, no segmentation masks).
- Selection and sampling protocol: Uniform, low-redundancy extraction for optimal semantic clustering.
- Rich anatomical diversity: Both lung and liver, with varied disease labels.
- Domain specificity: Designed specifically for US representation learning, directly targeting the limitations of transfer from natural-image datasets.
This construction enables superior pretraining, especially for contrastive and semi-supervised methodologies, and is validated by empirical gains in accuracy and detection/segmentation quality on various public and local downstream benchmarks.
7. Implications, Limitations, and Future Directions
US-4 represents a foundational resource for pretraining ultrasound-specific deep learning models, directly demonstrating reduced domain gap compared to natural-image-based pretraining. Its design is optimized for methods that exploit the nearby temporal-spatial structure of US video, such as contrastive and mixup-based representation learning.
This suggests that domain-specific pretraining using datasets such as US-4 is likely essential for robust transfer learning in the ultrasound domain. However, the lack of pixel-level segmentation masks and the partial restriction of certain sub-datasets may limit its immediate adoption for segmentation-heavy research or by institutions without access to the proprietary components. A plausible implication is that future expansions, possibly incorporating segmentation labels and broader anatomical coverage, could further enhance the impact and application horizon of US-4 and similar efforts.
The methodology introduced with US-4 lays important groundwork for systematic, domain-appropriate pretraining in medical imaging, establishing a new reference point for performance evaluation in ultrasound analysis research (Chen et al., 2020).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free