3DTeethSeg 2022 Challenge: Dental Segmentation

Updated 9 February 2026

3DTeethSeg 2022 Challenge is a benchmark evaluating advanced CAD methods in tooth localization, segmentation, and FDI labeling of 3D intraoral scans.
It leverages the expansive Teeth3DS+ dataset with 1,800 intraoral 3D meshes and 23,999 annotated teeth to ensure diverse, robust testing.
Challenge results highlight multi-stage pipelines and novel network architectures that achieve high precision in dental scan analysis.

Dental intraoral scan analysis is a critical pillar of computer-aided dentistry (CAD), underpinning applications such as orthodontic treatment planning, prosthetic design, and automated diagnostics. The 3DTeethSeg 2022 Challenge, held in conjunction with MICCAI 2022, catalyzed algorithmic advances in three primary domains: tooth localization, 3D mesh segmentation, and instance labeling under the FDI World Dental Federation’s two-digit scheme. This challenge leveraged Teeth3DS+, the largest rigorously annotated public dataset of its kind, to provide a standardized, reproducible benchmark for evaluating accuracy, robustness, and generalization of automated CAD tools across diverse dental anatomies and acquisition protocols (Ben-Hamadou et al., 2022, Ben-Hamadou et al., 2023).

1. Challenge Scope, Tasks, and Clinical Motivation

The 3DTeethSeg 2022 Challenge addressed three core tasks on intraoral 3D scans:

Tooth localization: Determine the presence and precise 3D position (centroid) of each tooth in a mesh.
Tooth segmentation: Assign every vertex (or face) to either one of the tooth instances or the background gingiva.
Tooth labeling (FDI identification): Assign the correct FDI number to each tooth segment, supporting universal clinical documentation.

Automating these labor-intensive steps is essential for scaling CAD workflows, reducing manual intervention, and enabling robust population-level studies. Challenges such as anatomical heterogeneity, dental crowding, missing or misaligned teeth, and scanner variability necessitate algorithmic solutions that can generalize across real-world patient populations (Ben-Hamadou et al., 2022, Ben-Hamadou et al., 2023).

2. Dataset Design, Acquisition, and Annotation Workflow

Teeth3DS+ undergirds the challenge as the first high-quality benchmark for intraoral 3D scan analysis. The dataset encompasses:

Composition: 1,800 meshes (900 anonymized patients, each with upper and lower jaws), totaling 23,999 annotated teeth.
Demographics: Balanced gender (50% male/female); 70% under 16 years; ~27% aged 16–59; ~3% over 60. Orthodontic and prosthetic cases are evenly represented.
Acquisition protocols: Data collected via Dentsply Primescan, 3Shape Trios3, and iTero Element 2 Plus with accuracies of 10–90 μm and 30–80 points/mm² density, maximizing anatomical and device diversity (Ben-Hamadou et al., 2022, Ben-Hamadou et al., 2023).

Annotation is executed through a multi-stage, expert-in-the-loop protocol:

Mesh cleanup: Removal of degenerate faces/vertices.
Pose normalization: PCA alignment to the occlusal plane.
Tooth cropping: Spherical region extraction and harmonic UV parameterization.
Manual boundary annotation: Polygonal boundary drawing in 2D UV space.
Back-projection: Mapping annotated boundaries onto the 3D mesh.
Instance assembly: Aggregation of tooth crowns.
Expert labeling: Assignment of FDI numbers by seasoned clinicians.
Multi-rater validation: Iterative re-annotation loop until consensus on delineation, labeling, and completeness (Ben-Hamadou et al., 2022, Ben-Hamadou et al., 2023).

Resulting datasets are disseminated as OBJ mesh files with vertex-level JSON label arrays and instance IDs.

3. Evaluation Metrics and Challenge Protocol

Quantitative comparison between challenge submissions is grounded in established metrics:

Dice Similarity Coefficient (DSC): $DSC(X,Y) = \frac{2|X\cap Y|}{|X|+|Y|}$ , measuring segmentation overlap.
Intersection over Union (IoU): $IoU(X,Y) = \frac{|X \cap Y|}{|X \cup Y|}$ .
Precision/Recall: Computed per-tooth and per-vertex for segmentation.
Localization accuracy: Proportion of predicted centroids within a prespecified threshold of true centers.
Identification rate (TIR): Fraction of segmented teeth with correct FDI labels (Ben-Hamadou et al., 2022, Ben-Hamadou et al., 2023).

Submissions are evaluated on a held-out test set of 600 scans, with ground-truth annotations withheld until final scoring. Overall scores combine localization ( $\exp(-TLA)$ ), segmentation (mean F1 score, or TSA), and labeling (TIR), with equal weighting for ranking (Ben-Hamadou et al., 2023).

4. Methodological Approaches and Baseline Results

Participating teams adopted multi-stage pipelines integrating point-based, mesh-based, and multi-view learning strategies. Key components:

Team	Backbone/Innovation	Segmentation/Labeling Approach
CGIP	Point Transformer	Boundary-aware sampling, contrastive loss
FiboSeg	2D Residual U-Net (multi-view)	2D renders (normals+depth), majority-voting
IGIP	PointNet++ (centroids), patch CNN	Binary separation, curvature ranking, FDI classification with arch-ordering
TeethSeg	Coarse-to-fine 3D U-Net	Volumetric grid, Random Walker refinement
OS	Top-view HRNet + 3D CNN	Centroid heatmap, graph-constrained refine
Champers	Swin-style Stratified Transformer	Sequential centroid and mask Transformer cascades

(Ben-Hamadou et al., 2023)

Notable training strategies include random geometric augmentations, joint Dice and cross-entropy losses, and multi-task model heads. All methods employ post-processing steps such as graph-cuts, clustering, and majority-vote fusion to resolve boundaries and correct topology.

Quantitative performance (test set):

Tooth localization accuracy: 45.3%–96.6% (Exp(−TLA): 0.6242–0.9924)
Mean Dice: 81.3%–98.6%
Teeth Identification Rate: 68.4%–91.0% (Ben-Hamadou et al., 2022, Ben-Hamadou et al., 2023)

CGIP achieved the highest mean segmentation (TSA = 0.9859), FiboSeg led in localization (Exp(−TLA)=0.9924), and IGIP in labeling (TIR=0.9289). Top three algorithms scored within 1pp of each other, highlighting the task’s difficulty (Ben-Hamadou et al., 2023).

5. Insights, Failure Modes, and Design Recommendations

Analysis of challenge outcomes yielded several insights:

Boundary precision is critical: Segmentation accuracy is dominated by the model’s ability to localize tooth-gingiva and inter-tooth boundaries, particularly under crowding and in the presence of artifacts (braces, reflectivity).
Multi-stage pipelines outperform monolithic models: Successful submissions first localize centroids, then perform per-tooth refinement (mask extraction and labeling).
Transformer, graph and multi-view architectures provide complementary benefits: Transformer-based (e.g., PointTransformer, Stratified Transformer) and graph-conv backbones produced superior boundary adherence and generalization, while multi-view 2D approaches efficiently leverage 2D CNN toolkits.
Failure Modes: Crooked, rotated, and missing teeth, as well as scan artifacts, remain challenging; clustering and patch cropping can miss small or crowded teeth, and boundary smoothness is imperfectly measured by F1 alone (Ben-Hamadou et al., 2022, Ben-Hamadou et al., 2023).

Best practices identified include pre-alignment to a canonical frame, explicit curvature modeling (e.g., via UV mapping), class-balanced sampling or focal loss for rare tooth classes, and post-processing with CRFs or morphological smoothing for mask refinement.

6. Extensions, Novel Methods, and Future Outlook

Subsequent work built upon the 3DTeethSeg 2022 framework, including the introduction of 2D foundation model fusion, order-aware assignment, and advanced prompt engineering.

SOFTooth (Li et al., 29 Dec 2025) integrated frozen 2D occlusal-view SAM embeddings via a point-wise residual gating module into a dual-stream 3D encoder, enabling boundary-aware semantics without 2D mask supervision. Center-guided mask refinement and order-aware Hungarian matching yielded significant improvements in mean IoU (88.99%) and FDI labeling stability, particularly for minority third molar classes.
3DTeethSAM (Lu et al., 12 Dec 2025) adapted Segment Anything Model 2 (SAM2) to 3D mesh segmentation through multi-view rendering, a transformer-based prompt generator, mask refiner, classifier modules, and deformable attention. Achieved state-of-the-art T-mIoU (91.90%) and Dice (94.33%), demonstrating the strong potential of transferring 2D foundational representations to 3D instance segmentation.
Future Directions: Recommendations include extending benchmarks to mixed dentition, edentulous and pathological scans; developing metrics penalizing boundary errors (e.g., Hausdorff); leveraging statistical tooth/jaw shape priors; exploring domain-adaptive and self-supervised learning to bridge scanner or clinical variation (Ben-Hamadou et al., 2022, Ben-Hamadou et al., 2023, Li et al., 29 Dec 2025, Lu et al., 12 Dec 2025).

These methodological advances position the 3DTeethSeg challenge as a reference point for CAD research and a driver of automated, robust dental analytics at scale.