Teeth3DS+: Benchmark for 3D Dental Scans

Updated 9 February 2026

Teeth3DS+ is a comprehensive benchmark comprising 1,800 intraoral 3D scans and 23,999 annotated teeth, designed for robust dental imaging analysis.
It standardizes tasks like segmentation, labeling, landmark detection, and 3D reconstruction with clinically validated protocols and granular performance metrics.
Baseline methods including Two-Stream GCN and advanced generative models (e.g., ToothForge, Mem4Teeth) demonstrate high accuracy and efficiency in dental scan analysis.

Teeth3DS+ refers to a rigorously constructed, clinically validated public benchmark and accompanying methodological ecosystem for intraoral 3D dental scan analysis. It is designed to support and evaluate advanced machine learning and geometric modeling approaches across a spectrum of dental imaging tasks, with a focus on clinical accuracy, reproducibility, and extensibility to new algorithmic paradigms. Teeth3DS+ provides standardized scan data, annotation protocols, performance metrics, and baseline results. It serves as both a foundational dataset and a reference framework for segmentation, identification, labeling, landmark detection, 3D modeling, and shape synthesis in computer-aided dentistry and related biomedical fields (Ben-Hamadou et al., 2022).

1. Dataset Composition and Clinical Validation

Teeth3DS+ consists of 1,800 intraoral 3D scans acquired from 900 anonymized patients, yielding 23,999 annotated teeth (11,898 upper, 12,101 lower) collected in controlled clinical settings. Data acquisition utilized leading intraoral scanners (Primescan, Trios 3, iTero Element 2 Plus), each delivering meshes in OBJ format with per-vertex instance and class annotations in aligned JSON files. The patient cohort is balanced by gender (50% male/female) and is clinically heterogeneous: 50% orthodontic cases, 50% prosthetic, with 70% under 16 years and 3% over 60 years of age.

The annotation pipeline follows a validated, expert-driven multistep procedure: 3D mesh cleaning, pose normalization, spherical crown cropping, harmonic mapping, 2D/3D boundary projection, crown assembly, FDI-based tooth labeling, and manual/iterative clinical validation. The “plus” variant introduces expert landmarking (occlusal, gingival points) per crown by clinicians with over ten years of experience. Mean inter-annotator boundary deviation is <0.2 mm, FDI code labeling agreement 100% on a double-blind subset, and landmarking repeatability has a standard deviation of ≈0.3 mm (Ben-Hamadou et al., 2022).

2. Supported Tasks and Performance Metrics

Teeth3DS+ is structured to facilitate rigorous benchmarking of the following tasks, each linked to granular metrics:

3D Teeth Localization: Instance detection of teeth in full-jaw scans.
3D Instance Segmentation: Per-tooth mask prediction, underpinning both geometric and semantic analyses.
Tooth Labeling: Instance-wise assignment of FDI codes.
Landmark Detection: Regression of clinically relevant crown keypoints.
3D Surface Reconstruction (Extended): Recovery of tooth surfaces from partial or alternate modalities (e.g., single intraoral photographs).

Standard evaluation metrics include the Dice coefficient, intersection-over-union (IoU), precision, recall, average precision (AP), labeling accuracy, and Euclidean landmark error, computed per scan and averaged. This metricization ensures comparability across methods and granular error localization (Ben-Hamadou et al., 2022).

3. Baseline Methods and Quantitative Results

Two international MICCAI challenges (3DTeethSeg 2022, 3DTeethLand 2024) established strong benchmarks. The Two-Stream GCN baseline achieved 96.58% localization accuracy, 98.59% segmentation Dice, and 91.00% labeling accuracy. On landmark detection, LandmarkNet yielded a mean error of 0.42 mm and an AP<sub\>0.5mm</sub> of 89.5%. Classical patch-based approaches (Mask R-CNN, PANet, HTC, ToothNet) were outperformed by hybrid pipelines exploiting both architectural innovations and domain-specific pre/post-processing (Ben-Hamadou et al., 2022).

A completely automated CBCT-based pipeline for tooth identification and segmentation achieved an F1-score of 93.35% and a Dice coefficient of 94.79% per-tooth, outperforming patch-based alternatives (ablation showed that the fusion of loose and tight ROIs in 3D U-Net produced the best Dice, 94.79±1.34%) (Jang et al., 2021).

4. Methodological Advances: Spectral, Memory-Augmented, and Implicit Models

Recent integration of sophisticated generative and completion paradigms has further extended Teeth3DS+:

ToothForge (Spectral Synchronization): Directly addresses heterogeneities in mesh connectivity through eigenspace synchronization. By aligning spectral coefficients for each input mesh to a common reference basis, ToothForge removes the requirement for unified remeshing and enables the encoding of any tooth mesh into a compact spectral code. The generative core is a β-VAE with a 16-dimensional latent, trained on synchronized coefficients, optimizing a cyclical β-term (β∈[0,0.05]). Quantitative results indicate spectral MSE≈0.09 and spatial MSE≈0.002 across tooth types, with sub-millisecond generation time per shape (10,000+ vertices). Ablation confirms that both synchronization and β-regularization are necessary for stable, artifact-free reconstructions and plausible interpolants. Additional features include inlined shape compression (low-pass spectral truncation) and connectivity invariance, directly suiting the multi-source data nature of Teeth3DS+ (Kubík et al., 3 Jun 2025).
Mem4Teeth (Prototype Memory Completion): Employs a retrieval-augmented, confidence-gated fusion of input descriptors with learnable prototype codes. Encoder–decoder pipelines are augmented with a VQ-style, self-organizing memory that forms cross-dataset shape prototypes without requiring positional labels. On the Teeth3DS benchmark, Mem4Teeth achieved lowest symmetric Chamfer Distance (CD-L2 = 1.56 × 10⁻⁴) and highest F-score@1% (62.4%), surpassing FoldingNet, PCN, PoinTr, and SVDFormer. Ablations reveal that the combination of prototype memory and dual encoders is necessary for optimally reducing CD and improving qualitative detail—most notably, recovery of cusp and ridge sharpness in molars. The architecture is compatible with any backbone, incurs O(Kd) overhead, and supports future extensions such as cross-modality memory and uncertainty-aware gating (Sun et al., 3 Dec 2025).
DMM (Component-Wise Implicit Morphable Model): Provides a parametric SDF representation for the entire (teeth+gum) arch, with component-wise latent codes (d = 10 per component, m = 15 total). The SDF for query point p is a weighted sum over component fields, each equipped with template and deformation nets. Learned through a combination of centroid, segmentation, smoothness, SDF, and latent priors, DMM achieves a symmetric Chamfer Distance of 0.00463 and F-score of 92.18% on Teeth3DS+ test scans. Unique capabilities include arbitrary segmentation, interpolation between dentitions, and semantic component replacement directly in latent space—all amenable to integration into any Teeth3DS+ pipeline. The latent inversion and marching cubes procedures are designed to operate efficiently for high-fidelity surface extraction (128³–256³ grids) (Zhang et al., 2022).

5. Access, Preprocessing, and Standardized Usage

Teeth3DS+ data is released under CC-BY 4.0 and is available via Figshare. All meshes adhere to OBJ format, with JSON metadata for labels and landmarks. Preprocessing recommendations include mesh cleaning, PCA-based alignment, optional uniform remeshing, and geometric augmentation (e.g., random rotation, jitter, scaling). Official code repositories provide loading utilities and standardized metric computation scripts, ensuring reproducibility of challenge and benchmark results (Ben-Hamadou et al., 2022).

Integration of recent spectral and implicit modeling approaches into the Teeth3DS+ ecosystem necessitates alignment or parameterization pre-processing steps (spectral basis computation for ToothForge, Procrustes alignment for DMM). Teeth3DS+ is thus structured to facilitate rapid experimentation and extension, including plug-and-play application of generative pipelines for augmentation, compression, and statistical analysis.

6. Applications and Open Research Problems

Teeth3DS+ underpins workflows for orthodontic and prosthetic planning (simulation, appliance design, direct-fabrication), CAD/CAM pipelines, dental education (anatomy visualization), and shape-based biometrics or population studies. The dataset has accelerated the development and deployment of automated analysis tools, directly reducing manual effort for segmentation and annotation in clinical contexts.

Current limitations include persistent difficulty with thin boundaries, rare pathology coverage, and limited scope of soft-tissue annotation (e.g., gingiva, papilla). The dataset misses detection of 3–5% of teeth and contains only four canonical landmarks per crown. Open directions include self-supervised pre-training, 2D–3D modality fusion, domain adaptation for scanner heterogeneity, and texture-based caries/wear detection (Ben-Hamadou et al., 2022).

A plausible implication is that future expansions of Teeth3DS+—by embracing cross-modal prototype memories, more sophisticated spectral embeddings, and unified implicit representations—will further democratize the development of generalizable, clinically robust dental models across scanning technologies and clinical contexts.