US-4 Ultrasound Dataset Overview

Updated 2 November 2025

US-4 Ultrasound Dataset is a large-scale, video-based repository that enables robust pretraining for ultrasound image analysis by addressing the domain gap from natural images.
It comprises 23,231 images from 1,051 video studies across diverse sub-datasets (Butterfly, CLUST, Liver Fibrosis, COVID19-LUSMS) with standardized disease and severity annotations.
Empirical results demonstrate that models pretrained with US-4 and the USCL framework, which employs contrastive learning and mixup augmentation, significantly outperform traditional ImageNet pretrained models.

The US-4 ultrasound dataset is a large-scale video-derived collection specifically designed for domain-specific pretraining of deep neural networks in ultrasound (US) image analysis tasks. Developed and first described in the context of the USCL (Ultrasound Contrastive Learning) framework, US-4 targets the performance bottlenecks induced by the domain gap between natural image pretraining (e.g., ImageNet) and the specialized requirements of US imaging. US-4 supplies a robust resource for feature learning in both supervised and semi-supervised paradigms, emphasizing effective contrastive representation learning with video-informed sampling and annotation.

1. Dataset Composition and Construction

US-4 is composed of 1,051 US video studies, yielding a total of 23,231 images sampled uniformly from these videos. The dataset integrates four sub-datasets, two of which (Butterfly and CLUST) are sourced from public US imaging archives, while the other two (Liver Fibrosis and COVID19-LUSMS) are locally collected. The data emphasizes two primary anatomical regions: the lung and the liver.

Sub-dataset	Organ	Image Size	Frame Rate	Classes	Videos	Images
Butterfly	Lung	658×738	23 Hz	2	22	1533
CLUST	Liver	434×530	19 Hz	5	63	3150
Liver Fibrosis	Liver	600×807	28 Hz	5	296	11714
COVID19-LUSMS	Lung	747×747	17 Hz	4	670	6834
Total					1051	23,231

Images are sampled at a regular interval of 3 frames per second per video ( $n=3$ ), with the interval defined as $I = T / n$ for a given video with frame rate $T$ . This strategy balances semantic consistency within a video with the need to avoid redundancy, yielding clusters that capture temporally adjacent yet sufficiently variable anatomical and pathological content.

2. Annotation Protocol and Content

All images in US-4 are annotated with classification labels appropriate for clinical ultrasound tasks. These include categories for disease status (e.g., COVID-19, pneumonia, healthy) in lung images and fibrosis severity grades in liver images. The annotation protocol is standardized across the sub-datasets to enable multi-class and multi-instance classification; however, pixel-level segmentation masks are not provided.

Images are acquired mainly using convex probes; for downstream evaluation, linear probe images from the UDIAT-B dataset are also referenced. Video capture leverages system-specific configurations (e.g., Resona 7T US with frequency FH 5.0, pixel size 0.101–0.127mm) for Liver Fibrosis and COVID19-LUSMS.

3. Pretraining and the USCL Framework

US-4 is architected for use in pretraining convolutional and transformer-based neural networks with the USCL semi-supervised contrastive learning method. The primary goal is to learn representations that are robust to imaging artifacts, speckle noise, and anatomical variability ubiquitous in ultrasound. The pretraining process leverages semantic clustering at the video level, using spatially and temporally correlated frames as positives in the contrastive objective.

Core sampling and augmentation strategy:

For each video $V_i$ , $K$ frames $\mathbb{F}_i^K$ are extracted, with $M$ images randomly subsampled for batch contrastive computation.
Positive Pair Generation (PPG) employs a mixup operator $G$ with Beta-distributed coefficients $(\xi_1, \xi_2)$ :

$\begin{cases} (x_i^{(1)}, y_i^{(1)}) = \xi_1 (\widehat{x}_i^{(2)}, \widehat{y}_i^{(2)}) + (1-\xi_1) (\widehat{x}_i^{(1)}, \widehat{y}_i^{(1)}) \ (x_i^{(2)}, y_i^{(2)}) = \xi_2 (\widehat{x}_i^{(2)}, \widehat{y}_i^{(2)}) + (1-\xi_2) (\widehat{x}_i^{(3)}, \widehat{y}_i^{(3)}) \end{cases}$

Loss formulation:

The USCL framework optimizes a compound loss:

$\mathcal{L} = \mathcal{L}_{con}(g(f(Aug(G(f));w_f);w_g)) + \lambda \mathcal{L}_{sup}(h(f(Aug(G(f));w_f);w_h);y)$

where $\mathcal{L}_{con}$ is the standard NT-Xent contrastive loss, $\mathcal{L}_{sup}$ is the cross-entropy for supervision, and $\lambda$ is a scaling factor ($0.2$ in reported experiments).

Augmentation employs random cropping, horizontal flip, rotation, and color jittering, contributing to improved invariance and generalization.

4. Downstream Performance and Benchmarking

Pretrained backbones (ResNet18) utilizing US-4 with the USCL method demonstrate a clear advantage over both ImageNet and popular self-supervised methods (MoCo v2, SimCLR) across multiple downstream clinical tasks:

Method	POCUS Accuracy (%)	UDIAT-B Det. AP	UDIAT-B Segm. AP
ImageNet	84.2	40.6	48.2
US-4 supervised	85.0	38.3	42.6
MoCo v2 (self-sup)	84.8	38.7	47.1
SimCLR (self-sup)	86.4	43.8	51.3
USCL (US-4)	94.2	45.4	52.8

On the POCUS lung dataset, USCL-pretrained models exceed ImageNet pretraining by more than 10% absolute accuracy (94.2% vs. 84.2%). F1 score improvements are similarly substantial (94.0% vs. 81.8%). Results on breast US detection/segmentation (UDIAT-B) also favor US-4 pretraining (+4.8% detection AP, +4.6% segmentation AP versus ImageNet).

Ablation studies highlight the necessity of both pair assignment and mixup in USCL; addition of the semi-supervised classification branch further enhances representational quality.

5. Access, Licensing, and Constituency

US-4's codebase and framework implementation are available via https://github.com/983632847/USCL. There is no explicit claim of full public release of the assembled video data; only the codebase is confirmed as open. Two sub-datasets, Butterfly and CLUST, are publicly accessible, while Liver Fibrosis and COVID19-LUSMS are locally collected and may require direct communication for access. This composite structure implies that, while the code and methodology are transferable, assembling the full dataset as described is non-trivial without additional data-sharing agreements.

6. Technical and Comparative Features

The US-4 dataset is distinguished by:

Moderate scale and video-based origin: 23k images from >1k videos, versus patch-based or static-image datasets.
Focus on classification-ready annotation (i.e., class labels, no segmentation masks).
Selection and sampling protocol: Uniform, low-redundancy extraction for optimal semantic clustering.
Rich anatomical diversity: Both lung and liver, with varied disease labels.
Domain specificity: Designed specifically for US representation learning, directly targeting the limitations of transfer from natural-image datasets.

This construction enables superior pretraining, especially for contrastive and semi-supervised methodologies, and is validated by empirical gains in accuracy and detection/segmentation quality on various public and local downstream benchmarks.

7. Implications, Limitations, and Future Directions

US-4 represents a foundational resource for pretraining ultrasound-specific deep learning models, directly demonstrating reduced domain gap compared to natural-image-based pretraining. Its design is optimized for methods that exploit the nearby temporal-spatial structure of US video, such as contrastive and mixup-based representation learning.

This suggests that domain-specific pretraining using datasets such as US-4 is likely essential for robust transfer learning in the ultrasound domain. However, the lack of pixel-level segmentation masks and the partial restriction of certain sub-datasets may limit its immediate adoption for segmentation-heavy research or by institutions without access to the proprietary components. A plausible implication is that future expansions, possibly incorporating segmentation labels and broader anatomical coverage, could further enhance the impact and application horizon of US-4 and similar efforts.

The methodology introduced with US-4 lays important groundwork for systematic, domain-appropriate pretraining in medical imaging, establishing a new reference point for performance evaluation in ultrasound analysis research (Chen et al., 2020).

PDF Markdown Chat (Pro)

References (1)

USCL: Pretraining Deep Ultrasound Image Diagnosis Model through Video Contrastive Representation Learning (2020)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to US-4 Ultrasound Dataset.

US-4 Ultrasound Dataset Overview

1. Dataset Composition and Construction

2. Annotation Protocol and Content

3. Pretraining and the USCL Framework

4. Downstream Performance and Benchmarking

5. Access, Licensing, and Constituency

6. Technical and Comparative Features

7. Implications, Limitations, and Future Directions

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

US-4 Ultrasound Dataset Overview

1. Dataset Composition and Construction

2. Annotation Protocol and Content

3. Pretraining and the USCL Framework

4. Downstream Performance and Benchmarking

5. Access, Licensing, and Constituency

6. Technical and Comparative Features

7. Implications, Limitations, and Future Directions

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research