Papers
Topics
Authors
Recent
2000 character limit reached

3DTeethSeg Benchmark Overview

Updated 19 December 2025
  • The 3DTeethSeg Benchmark introduces a large-scale, high-resolution dataset comprising 1,800 intra-oral scans with detailed per-vertex annotations for automated teeth analysis.
  • It leverages heterogeneous acquisition methods and state-of-the-art deep learning techniques, including point-cloud networks and mesh-based GCNs, for precise dental segmentation and labeling.
  • It establishes standardized evaluation metrics and rigorous clinical validation, setting a new benchmark and paving the way for future advancements in computer-aided dentistry.

The 3DTeethSeg Benchmark, also known in its extended form as Teeth3DS+, is the first publicly released, large-scale benchmark for intra-oral 3D scan analysis, with a focus on automated teeth localization, segmentation, and labeling. Developed under the auspices of the MICCAI 3DTeethSeg’22 and 3DTeethLand’24 challenges, it defines the field standard for evaluating computer vision algorithms on high-resolution, clinically relevant intra-oral meshes and supports future developments in automated, robust computer-aided dentistry, orthodontics, and prosthetic planning (Ben-Hamadou et al., 2022, Ben-Hamadou et al., 2023, Lu et al., 12 Dec 2025).

1. Dataset Scope, Structure, and Acquisition

The 3DTeethSeg benchmark comprises 1,800 high-resolution intra-oral scans from 900 anonymized patients (each contributing upper and lower jaw scans acquired separately). The dataset includes 23,999 teeth annotated at the vertex level. Patient demographics are balanced: 50% male/female, 70% under 16 years, 27% aged 16–59, 3% over 60, with 50% orthodontic and 50% prosthetic cases.

Scans were produced with commercial hardware—3Shape Trios3, Dentsply Primescan, and iTero Element 2 Plus—yielding point resolutions of 30–80 pts/mm² and reported spatial accuracy of 10–90 μm. Mesh preprocessing involves degenerate face removal, PCA alignment (to the occlusal plane), and coordinated tooth cropping. Each tooth with adjacent gingiva is UV-mapped via harmonic parameterization to permit high-precision 2D annotation, which is then projected back onto the 3D mesh. Teeth are labeled according to FDI two-digit codes. Every annotation underwent clinical validation or correction by orthodontists or dental surgeons with ≥5 years of experience (Ben-Hamadou et al., 2022).

2. Tasks and Annotations

The benchmark covers three core tasks:

  • Tooth localization: Predicts 3D centers or bounding spheres for each visible tooth; ground truth is the centroid of each tooth’s spherical crop.
  • Segmentation: Assigns per-vertex class labels—gingiva (0) or a unique tooth instance index (1…N), per scan.
  • FDI labeling: Maps each detected or segmented tooth to the correct two-digit FDI code (e.g., 11–18, 21–28, 31–38, 41–48).

All annotations derive from the standardized, multi-step protocol outlined in Section 1, ensuring consistent per-vertex labeling with stringent clinical review at each step. Planned future extensions include per-crown anatomical landmark annotation (e.g., incisal edges, cusp tips), supporting downstream orthodontic measurement tasks (Ben-Hamadou et al., 2022).

3. Evaluation Metrics

Performance is assessed using classical overlap and classification metrics, computed per-tooth, per-jaw, and globally:

  • Intersection over Union (IoU): IoU(P,G)=∣P∩G∣∣P∪G∣\mathrm{IoU}(P, G) = \frac{|P \cap G|}{|P \cup G|}
  • Dice coefficient: Dice(P,G)=2∣P∩G∣∣P∣+∣G∣\mathrm{Dice}(P, G) = \frac{2|P \cap G|}{|P| + |G|}
  • Precision/Recall: Precision=∣P∩G∣∣P∣\mathrm{Precision} = \frac{|P \cap G|}{|P|}, Recall=∣P∩G∣∣G∣\mathrm{Recall} = \frac{|P \cap G|}{|G|}
  • F1-score: F1=2 Precision×RecallPrecision+RecallF_1 = \frac{2\,\mathrm{Precision} \times \mathrm{Recall}}{\mathrm{Precision} + \mathrm{Recall}}
  • Localization accuracy: Percentage of ground-truth tooth centers with a predicted center within a fixed radius threshold.
  • FDI identification rate: Proportion of correctly assigned FDI labels among detected/segmented teeth.

In the MICCAI 2022 challenge, composite scores aggregated localization, segmentation, and labeling metrics for overall system ranking (Ben-Hamadou et al., 2023).

4. Baseline Methods and State of the Art

Multiple deep learning strategies have been benchmarked:

  • Point-cloud networks: PointNet++, DGCNN, PT (Point Transformer), IsbNet—typically operate on downsampled meshes.
  • Mesh-based GCNs: MeshSegNet, iMeshSegNet, TeethGNN, TSGCNet, DilatedSegNet, CBAnet—directly process mesh connectivity.
  • Hybrid models: TSRNet (dual-stream mesh + CNN), ToothGroupNet, dual-stream GCNs with boundary-aware losses.
  • 2D UV-mapped CNNs: HarmonicNet style methods operate on curvature-flattened crown projections.
  • Multi-stage architectures: Patch-wise or view-based decomposition (e.g., FiboSeg), with 2D-3D mapping and post-processing via morphological operators or graph cut.

Recent approaches such as ToothGroupNet (mIoU = 90.16%) and dual-stream GCNs (mIoU = 98.6%, localization = 96.6%, FDI labeling = 91.0%) defined the state of the art prior to the introduction of vision foundation models (Lu et al., 12 Dec 2025). The 3DTeethSAM method adapts the 2D Segment Anything Model 2 (SAM2) by rendering multi-view images and post-hoc 3D voting, coupled with learnable modules for prompt embedding, mask refinement, mask classification, and morphology-aware attention (DGAP), reporting a new benchmark T-mIoU of 91.90%, B-IoU 70.05%, and Dice 94.33% (Lu et al., 12 Dec 2025).

5. Challenge Protocol and Results

The official challenge uses a 1,800 scan split: 1,200 for training and 600 for testing. The 2022 challenge saw six finalist teams employing multi-stage approaches: combinations of transformers, 2D–3D view voting, clustering, and multi-task learning (Ben-Hamadou et al., 2023). The top entries (CGIP, FiboSeg, IGIP) achieved, respectively:

  • Tooth localization (Exp(-TLA)): up to 0.9924
  • Tooth segmentation accuracy (TSA): up to 0.9859
  • Teeth identification rate (TIR): up to 0.9289

Comparative analysis identified performance variations by task focus (segmentation, localization, labeling), with methods often trading boundary accuracy for centroid detection precision. Persistent challenges include segmentation in mixed/erupting dentition, missing teeth, and cases of extreme crowding or malocclusion (Ben-Hamadou et al., 2023).

6. Advances Over Previous 3D Dental Benchmarks

3DTeethSeg/Teeth3DS+ supersedes all prior public datasets, which were limited by small sample size (<100 scans) or scope (single-crown or coarse bounding boxes), and rarely included per-vertex, full-jaw, FDI-labeled annotations. Key advances include:

  • Scale: 1,800 full-jaw scans and ≈24,000 annotated teeth
  • Acquisition heterogeneity: Multi-device, multi-operator, broad demographics, and treatment types
  • Annotation granularity: Per-vertex instance and semantic labels, plus planned landmarks
  • Standardized splits, reference code, and metrics enabling reproducibility and fair comparison
  • Clinical validation: All labels double-checked and corrected by experts (Ben-Hamadou et al., 2022).

7. Limitations and Future Directions

Current limitations include under-representation of severe anomalies (braces, implants, extensive restorations) and focus on crowns/gingiva rather than roots or periodontal support. Planned benchmark extensions will include challenging cases, anatomical landmark annotation, root segmentation, and evaluation of real-time, hardware-constrained inference.

There is also a need for standardized metrics for boundary smoothness (Hausdorff, boundary-F1), robust missing-tooth detection, efficiency benchmarking, and weakly/semi-supervised learning protocols. The Teeth3DS+/3DTeethSeg benchmark is expected to catalyze the next generation of geometric deep learning and vision foundation model research in dental informatics (Ben-Hamadou et al., 2022, Lu et al., 12 Dec 2025).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to 3DTeethSeg Benchmark.