3DTeethLand Challenge

Updated 25 February 2026

The 3DTeethLand Challenge is a benchmark initiative that refines automated 3D dental landmark detection using a high-quality, annotated dataset and standardized evaluation metrics.
It employs 340 high-resolution intraoral scans, with 240 annotated for training, ensuring expert-driven precision and negligible inter-annotator variability.
The challenge drives innovation in algorithmic solutions, contrasting segmentation-based and direct landmark detection methods to enhance clinical diagnostics and workflow efficiency.

The 3DTeethLand Challenge is a benchmark initiative in computational dentistry, established to drive advancements in 3D dental landmark detection from intraoral scans. Conducted in conjunction with MICCAI 2024, it leverages an expertly curated and publicly available dataset, standardized rigorous evaluation metrics, and a competitive environment to assess and compare state-of-the-art algorithmic solutions. The challenge targets the automated, clinically reliable localization of critical anatomical landmarks on individual teeth—tasks integral to diagnostics, personalized orthodontic planning, and treatment monitoring in dental practice (Ben-Hamadou et al., 9 Dec 2025).

1. Dataset Development and Annotation Standards

The 3DTeethLand dataset comprises 340 high-resolution intraoral 3D scans represented as triangular surface meshes (.OBJ format), each with point densities of 30–80 pts/mm² and mesh accuracy exceeding 90 µm. Out of these, 240 scans are fully annotated for training and 100 unannotated scans serve as the held-out private test set. Every tooth in each scan is annotated with six anatomical landmark classes: mesial, distal, cusp (1–5 points, tooth-type dependent), inner gingival, outer gingival, and facial axis. Annotation was performed by a trained clinical technician and refined through consensus by a board of three senior dentists (>10 years of experience per expert). Resulting inter-annotator variability is negligible, with mean landmark deviation < 0.2 mm (Ben-Hamadou et al., 9 Dec 2025).

2. Task Formulation and Evaluation Metrics

The core objective is, for each 3D scan $S \in \mathbb{R}^{N \times 3}$ , to predict a set of $L$ landmark points $\{\hat{p}_i \in \mathbb{R}^3\}_{i=1,\dots,L}$ with $\hat{p}_i \approx p_i$ (ground-truth coordinates). The challenge employs several rigorous evaluation measures:

Mean Radial Error (MRE):

$\text{MRE} = \frac{1}{L} \sum_{i=1}^{L} \|p_i - \hat{p}_i\|_2$

Success Detection Rate (SDR) at threshold $\tau$ :

$\text{SDR}(\tau) = \frac{1}{L} \left| \{i : \|p_i - \hat{p}_i\|_2 \leq \tau \} \right|$

Object Detection Metrics for variable-count classes (e.g., cusps):

— Mean Average Precision (mAP): area under the precision-recall curve at multiple distance thresholds (0–3 mm, 0.1 mm steps) — Mean Average Recall (mAR): area under the recall– $\exp(-\text{distance})$ curve

Leaderboard Robustness:

Final team ranking uses Wilcoxon signed-rank tests (p < 0.001) and bootstrap resampling (100 iterations; 10% holdout) to derive a robust "Rank Score" (Ben-Hamadou et al., 9 Dec 2025).

3. Challenge Protocol, Baselines, and Preprocessing

Data is split into 240 training and 100 test cases. Input meshes may be decimated (to ~10,000 faces) or sampled to obtain point clouds (8,000–16,000 points/scan). Features include geometry-based (normals, curvature) and standard normalization aligns all scans to a unit sphere centered at the jaw centroid. Dynamic augmentations—random rotations (±15°), scale (±5%), point jitter (σ = 0.01–0.05 mm)—are applied during training.

Participants encapsulate their solutions in Docker containers; evaluation is conducted on the Synapse platform. A public reference baseline—a nearest-neighbor landmark regressor on pre-segmented tooth patches—is provided for comparison (Ben-Hamadou et al., 9 Dec 2025).

4. Algorithmic Approaches and Performance

Dominant solution paradigms emerged:

Segmentation-based pipelines:

First, tooth (or instance) segmentation, followed by per-tooth landmark regression/classification.

Direct landmark detection:

Landmark localization directly from entire point clouds without explicit segmentation.

Top entries employed advanced geometric deep learning modules:

Team	Methodology Summary	Key Architectures	Rank Score	mAP	mAR
Radboud	Two-stage: instance segmentation + per-class regressors	PointNet-style w/ seed/offset/tooth decoders, DBSCAN	0.9172	0.785	0.656
ChohoTech	DGCNN, dual head, post-processing NMS	DGCNN, heatmap+offset regression	0.8325	0.775	0.634
YY-LAB	Segmentation + graph-cut + heatmap, bipartite cusp matching	TeethGNN, TL-DETR	0.6224	0.719	0.579
YN-LAB	Curriculum learning, coarse/fine segmentation, clustering	PointMLP, HDBSCAN	0.3171	0.643	0.527
IGIP-LAB	Transformer encoder, confidence regression, clustering	PointTransformer V3	0.1358	0.590	0.466
3DIMLAND	Transformer-based, 6-channel distance decoder, NMS	Custom encoder, CTD-NMS	0.0325	0.578	0.438

Radboud’s approach—PointNet-based instance segmentation with class-specific decoders and weighted DBSCAN—achieved the highest overall mAP and mAR across all landmark categories (e.g., mAP: cusp 0.772, facial 0.768, inner/outer 0.793, mesial/distal 0.792) (Ben-Hamadou et al., 9 Dec 2025). ChohoTech (2nd) utilized a DGCNN backbone and dual heads, excelling in robustness and runtime.

5. Error Analysis and Clinical Implications

Landmark detection accuracy varies by class:

Mesial/distal landmarks are most challenging (AP ≈ 0.575–0.792), influenced by narrow inter-tooth spacing.
Cusps exhibit moderate difficulty (AP ≈ 0.59–0.77), especially for multi-cusp molars.
Inner/outer gingival margins and facial axis points are most reliable (AP ≈ 0.61–0.79).

Top models achieve clinically meaningful performance: aggregate SDR(1 mm) for Radboud exceeds 0.82, aligning with sub-millimetric accuracy requirements in orthodontics. Automated 3D landmarking demonstrates potential to reduce annotation time from ~10 min/case to <1 min/case (Ben-Hamadou et al., 9 Dec 2025).

6. Limitations and Future Directions

The current challenge data emphasize adult dentition with sparse pathological variation. Further improvements are envisioned in several directions:

Data diversity:

Including mixed dentition, severe malocclusions, and implant-restored arches; expanded demographic/ethnic coverage.

Algorithmic robustness:

Generalization to noise, occlusions, partial scans, and cross-device inputs—suggesting increased work on domain adaptation and uncertainty quantification.

End-to-end CAD/CAM integration:

Full orthodontic workflows require pipeline architectures combining tooth segmentation, landmark detection, and occlusal analysis in a seamless system (Ben-Hamadou et al., 9 Dec 2025).

7. Relationship to Prior Tasks and Benchmarks

3DTeethLand builds upon the teeth segmentation and labeling pipelines from preceding challenges, notably 3DTeethSeg'22 (Ben-Hamadou et al., 2023) and datasets such as Teeth3DS+ (Ben-Hamadou et al., 2022). These prior initiatives focused on tooth localization, segmentation, and FDI labeling; landmark detection emerges as the next stage, with evaluation shifting from segmentation Dice and labeling accuracy to strict pointwise geometric proximity (MRE, SDR, mAP/mAR). Baseline architectures (e.g., MeshSNet, two-stream GCNs, Point Transformer) demonstrated strong segmentation performance but required adaptation for the high spatial precision demanded by landmark localization.

A plausible implication is that techniques refined in 3DTeethLand—such as geometric transformers, distance-map regression, and topology-driven NMS—will propagate to broader applications such as 3D cephalometry, surgical planning, and multi-modal dental data fusion (Kubík et al., 15 Apr 2025, Ben-Hamadou et al., 2022).

References

(Ben-Hamadou et al., 9 Dec 2025, Kubík et al., 15 Apr 2025, Ben-Hamadou et al., 2023, Ben-Hamadou et al., 2022)