TUS-REC2024 Challenge: Trackerless 3D Ultrasound

Updated 1 July 2025

TUS-REC2024 Challenge is an open benchmark that evaluates trackerless 3D ultrasound reconstruction using freehand scans without external tracking devices.
It addresses key challenges like inter-frame motion estimation, drift minimization, and ensuring algorithm generalizability across diverse scanning protocols.
The challenge promotes reproducible research and clinical innovation through its public dataset, evaluation code, and an evolving leaderboard for performance comparison.

The TUS-REC2024 Challenge is a structured, open scientific benchmark for trackerless 3D freehand ultrasound reconstruction. It was established to accelerate progress in inferring the 3D spatial arrangement of freehand ultrasound images—without external tracking—using a large-scale, rigorously standardized public dataset, evaluation codebase, and a defined set of metrics. The challenge provides an ongoing reference point for the development and assessment of algorithms in ultrasound volumetric imaging where portability, cost, and accessibility are paramount concerns.

1. Challenge Foundations and Objectives

TUS-REC2024 aims to benchmark algorithms that produce accurate 3D reconstructions of anatomical structures from sequential 2D freehand ultrasound images in the absence of external tracking devices. The challenge addresses three core technical barriers:

Inter-frame motion estimation: Inferring relative probe poses solely from image content and internal cues.
Minimization of drift accumulation: Reducing cumulative errors in pose over long scan sequences.
Generalisability: Ensuring consistent performance across diverse probe trajectories, scan protocols, and subjects.

By hosting the first public large-scale trackerless dataset, TUS-REC2024 advances reproducibility, transparency, and objective comparison, providing a foundation for research towards deployable, low-cost 3D ultrasound solutions suitable for point-of-care and low-resource scenarios.

2. Dataset Composition and Acquisition Standards

The challenge dataset comprises:

Subjects and Scans: 85 healthy adults (left/right forearms), 2,040 scans, approximately 1,025,448 2D frames at 20 Hz (480×640px).
Protocols: Straight, C-shaped, and S-shaped probe trajectories; both orientation directions (distal-proximal, proximal-distal); parallel and perpendicular to limb axis.
Ground Truth: Each frame is registered to 3D space via a NDI Polaris Vicra optical tracker, yielding 6-DoF transformations per image.
Calibration and Accuracy: Systematic pre-scan calibration using a pinhead phantom achieves sub-millimeter 3D RMS volumetric accuracy (<0.25 mm).
Split: 50 subjects (1,200 scans) for training, 3 (72) for validation, 32 (768) for test; every subject is exclusive to one set.
Availability: The complete dataset, along with code for evaluation and a reference baseline, is public (Zenodo and GitHub).

The dataset’s scale and diversity are intended to capture real-world variability in scanning—and to promote algorithmic generalizability.

3. Baseline and Participant Algorithms

The task in TUS-REC2024 is to predict, for each sequence, frame-to-frame and global spatial transformations, output as dense displacement fields (DDFs) for both per-pixel and per-landmark correspondence.

Baseline Model

Architecture: EfficientNet-B1 CNN processes pairs of successive frames.
Output: Predicts the 6-DoF transformation (rotation + translation) for each image pair.
Supervision: Mean squared error over transformed 3D ultrasound image corners:

$\mathcal{L} = D\left(T^{gt}_{j \leftarrow i} \cdot T_{scale} \cdot \mathbf{p}_{corner},\,T_{j \leftarrow i} \cdot T_{scale} \cdot \mathbf{p}_{corner}\right)$

Inference: Sequential composition of local transforms between adjacent frames to estimate global pose:

$T_{1 \leftarrow i} = T_{1 \leftarrow 2} \cdot T_{2 \leftarrow 3} \cdots T_{i-1 \leftarrow i}$

Team Approaches

A total of 6 teams submitted 21 dockerized models, employing a range of strategies:

Rank	Model	Highlights
1	FiMoNet	ResNet18+Mamba (State Space Model) ensemble; L1+Pearson loss; long-range temporal modeling
2	RecuVol	EfficientNet+LSTM; ensemble learning; TrivialAugment; robust to global drift
3	FlowNet	EfficientNet-B6; 10-frame input; dense flow field; best global accuracy
4	MoGLo-Net	ResNet encoder; motion correlation volume, global-local attention, Conv-GRU
5	PLPPI	Dual-stream spatial/temporal CNN; physics-informed by speckle decorrelation; CLIP feature loss
6	Baseline	EfficientNet-B1; basic pairwise supervision; no temporal module

Common innovations include sequence modeling (LSTM, State Space Models, Conv-GRU), advanced loss functions (L1, triplet, correlation, embedding consistency), and pretraining (ImageNet, Biomedical CLIP). The winning method, FiMoNet, leverages an ensemble of Mamba-enhanced ResNets for long-range temporal consistency, providing notable improvements in local accuracy.

4. Evaluation Protocols and Results

Each submission is required to predict the following DDFs:

Global-Pixel DDF: Displacement for every image pixel to the first frame.
Local-Pixel DDF: Displacement between each frame and its previous frame.
Global-Landmark DDF: SIFT-based keypoint displacements to first frame.
Local-Landmark DDF: As above, but frame-to-frame.

Metrics:

Global Pixel Error (GPE), Global Landmark Error (GLE), Local Pixel Error (LPE), Local Landmark Error (LLE): All measured in mm as mean Euclidean distances.
Final Score (FS): Mean of scanwise-normalized versions of the above four metrics:

$\text{FS} = 0.25 \cdot GPE^* + 0.25 \cdot GLE^* + 0.25 \cdot LPE^* + 0.25 \cdot LLE^*$

Runtime: Mean prediction time per scan.

Main Findings:

FiMoNet achieved the best final score (0.852), excelling in local (frame-to-frame) accuracy, while FlowNet outperformed in global error.
Global drift remains a key challenge, as longer scans and complex (S-shaped, parallel) probe motions increase reconstruction error (correlations $r = 0.3$ –$0.78$).
Statistical robustness: Bootstrap resampling confirmed ranking stability; team differences were statistically significant ( $p < 0.001$ for most pairs).
Trade-offs: No single architecture dominated all metrics; local accuracy did not always correlate with low global error, highlighting the complexity of drift control.
Efficiency: Most models achieved run times compatible with clinical deployment scenarios.

5. Limitations, Insights, and Future Directions

The challenge elucidated the following limitations and developmental focal points:

Drift Accumulation: All methods experienced error growth with sequence length, underscoring the need for improved temporal modeling and drift correction. This motivates investigation into hybrid methods that can combine local consistency with explicit global regularization.
Generalization: Performance deteriorated on more complex scan shapes and different probe orientations, revealing the need for architectures resilient to trajectory diversity and extended scanning sessions.
Benchmark Growth: There is a clear directive to expand dataset diversity (anatomical coverage, clinical settings), refine evaluation (alternative normalizations; undisclosed test sets), and provide richer baseline resources.

A plausible implication is that future leading methods will need to incorporate explicit mechanisms for global topology consistency, possibly via attention modules, physics-inspired priors, or cross-scan feature aggregation.

6. Community Impact and Ongoing Development

TUS-REC2024 has established a live, evolving leaderboard and an accessible, reference-grade public codebase. Its ongoing organization as part of MICCAI, coupled with the scale and availability of its data and evaluation infrastructure, has facilitated cross-institutional participation and reproducibility. The challenge serves as a catalyst and baseline not only for the image-guided intervention community but for broader volumetric imaging research, informing both incremental advances and the development of future clinical workflows deploying portable, trackerless ultrasound systems.

7. Technical Summary Table: Evaluation Metrics (Per Scan Mean)

Metric	Definition	Nature
GPE	Mean global pixel error (mm)	All pixels
GLE	Mean global landmark error (mm)	SIFT landmarks
LPE	Mean local (adjacent frame) pixel error (mm)	All pixels
LLE	Mean local (adjacent frame) landmark error (mm)	SIFT landmarks

Each method is scored via normalized merge of these four values, and all code and data for metric validation are provided in the public challenge repository.

The TUS-REC2024 Challenge represents the current benchmark for trackerless 3D ultrasound reconstruction research, providing a comprehensive platform for evaluation, fair comparison, and continued innovation in point-of-care volumetric imaging technologies.

PDF Markdown Chat (Upgrade)