Papers
Topics
Authors
Recent
2000 character limit reached

Teeth Segmentation Accuracy

Updated 9 February 2026
  • Teeth Segmentation Accuracy (TSA) is defined as a set of metrics that quantify how precisely dental image segmentation models delineate individual teeth relative to expert annotations.
  • It incorporates methods such as IoU, DSC, and boundary-specific metrics across diverse modalities like 2D radiographs, 3D scans, and CBCT to ensure robust evaluation.
  • High TSA supports clinical workflows by enabling accurate diagnosis, improved treatment planning, and reliable model validation in digital dentistry.

Teeth Segmentation Accuracy (TSA) is a central metric family for evaluating the fidelity of automated dental image segmentation models across 2D radiographs, 3D intraoral scans, and volumetric imaging such as cone-beam CT (CBCT). TSA quantifies how accurately a system delineates individual teeth (or tooth regions) relative to expert-annotated ground truth, typically capturing both pixelwise or pointwise correctness, spatial overlap, and boundary integrity. Robust TSA metrics underpin the validation and comparison of deep learning models in digital dentistry, enabling precise diagnosis, treatment planning, and a variety of clinical workflows.

1. Core Definitions and Variants of TSA

The most widely adopted TSA metrics are derived from classical segmentation performance indices but are tailored to the dental domain’s instance- and boundary-specific challenges.

  • Intersection-over-Union (IoU):

For binary or multi-class mask PP (prediction) and GG (ground truth),

IoU=∣P∩G∣∣P∪G∣\mathrm{IoU} = \frac{|P \cap G|}{|P \cup G|}

This is frequently averaged over all tooth instances, yielding a mean IoU (mIoU) as a global measure (Dhar et al., 2023, Mustakim et al., 23 Nov 2025, Xi et al., 31 Mar 2025).

  • Dice Similarity Coefficient (DSC):

DSC=2∣P∩G∣∣P∣+∣G∣\mathrm{DSC} = \frac{2|P \cap G|}{|P| + |G|}

Closely related to IoU via

DSC=2 IoU1+IoU\mathrm{DSC} = \frac{2\,\mathrm{IoU}}{1+\mathrm{IoU}}

  • Accuracy (ACC/OA):

Pointwise or pixelwise accuracy,

ACC=TP+TNTP+TN+FP+FN\mathrm{ACC} = \frac{TP + TN}{TP + TN + FP + FN}

where TPTP, TNTN, FPFP, FNFN are true/false positives/negatives (Dhar et al., 2022, Mustakim et al., 23 Nov 2025, Zhao et al., 2022).

  • Boundary Metrics:
  • Composite/Challenge Scores:

Leaders in public challenges employ composite TSA scores aggregating multiple terms, such as:

TSA Score=w1 Dice+w2 IoU+w3 (1−H(d))\mathrm{TSA\,Score} = w_1\,\mathrm{Dice} + w_2\,\mathrm{IoU} + w_3\,(1 - H(d))

with empirically derived weights (Zhang et al., 2024).

  • Instance-Level F1/TSA (3D challenges):

TSA=1∣T∣∑t∈TF1t\mathrm{TSA} = \frac{1}{|\mathcal{T}|} \sum_{t\in\mathcal{T}} F1_t

where F1tF1_t is the harmonic mean of precision and recall for tooth tt (Ben-Hamadou et al., 2023).

2. Quantification and Evaluation Protocols

TSA computation is protocol- and modality-dependent:

Frameworks such as ViSTooth compute additional metrics (shape via Hu moments, position, angle) and aggregate them for composite TSA assessment (Zhu et al., 2024).

Common Experimental Setups

  • Cross-validation and held-out test sets are standard for reporting TSA (Dhar et al., 2023, Dhar et al., 2022).
  • Challenge protocols require locked test sets and no access to ground truth, demanding strict generalizability (Ben-Hamadou et al., 2023, Zhang et al., 2024).
  • Ablation studies systematically measure TSA gains from architectural innovations, loss terms, and boundary-focused strategies.

3. Model Architectures and Loss Functions Optimizing TSA

Architectures achieving state-of-the-art TSA integrate domain-specific advances:

  • Encoder–Decoder and Attention Mechanisms:

EfficientNet-B7 encoders, grid-based attention gates, and parallel squeeze-excitation modules in FUSegNet (Dhar et al., 2023); recurrent convolutional modules and residual bridges in S-R2F2U-Net (Dhar et al., 2022); M-Net U-shape with Swin transformers and tooth-dedicated attention blocks (TAB) (Ghafoor et al., 2023).

  • Boundary-aware Modules:

Reverse-attention-based boundary extraction (BFEM), feature cross-fusion (FCFM) (Zhang et al., 2024), instance-boundary loss (Cai et al., 30 Dec 2025), and contrastive learning at the tooth–gingiva interface (Xi et al., 31 Mar 2025).

  • Proposal-free Instance Segmentation:

Transformer-style mask embedding heads enable robust handling of missing or malposed teeth (Cai et al., 30 Dec 2025).

  • Geometry-guided Losses:

Curvature-aware focal losses upweight high-curvature (boundary) points, leading to improved TSA and smoother boundaries (Xiong et al., 2023, Xiong et al., 2022, Cai et al., 30 Dec 2025).

  • Hybrid and Regularization Losses:

Hybrid Dice + Focal/cross-entropy (Dhar et al., 2023, Dhar et al., 2022), squared Dice for class imbalance (Ghafoor et al., 2023), L2 regularization (Budagam et al., 2024), and composite losses balancing global and boundary-focused terms (Zhang et al., 2024).

4. Quantitative Benchmarks

Multiple models and challenges provide direct comparative values for TSA across varying datasets and modalities:

Model / Setting TSA Metric Value(s) Dataset / Task
DE-KAN (Mustakim et al., 23 Nov 2025) Accuracy / Dice 98.91% / 97.1% CDPR 2D radiographs
iMeshSegNet (Wu et al., 2021) Dice 0.964 ± 0.054 3D intraoral mesh
FUSegNet+AG+P-scSE (Dhar et al., 2023) IoU / Dice / RIoU 82.43% / 90.37% / 82.82% Panoramic X-ray
S-R2F2U-Net (Dhar et al., 2022) Accuracy / Dice 97.31% / 93.26% Dental X-ray
CGIP@3DTeethSeg'22 (Ben-Hamadou et al., 2023) Instance-level F1 0.9859 3D challenge
BATISNet (Cai et al., 30 Dec 2025) mIoU / mAP 84.42% / 81.93% Point cloud instance
BFFNet (Zhang et al., 2024) (STS Challenge) "TSA score" 0.91 2D challenge
TSegFormer (Xiong et al., 2023) TSA (Acc) / mIoU 97.97% / 94.34% 3D IOS, 16,000 scans
CrossTooth (Xi et al., 31 Mar 2025) mIoU / Bound. IoU 95.86% / 82.06% 3D mesh, boundary
TSGCN (Zhao et al., 2022) OA / mIoU 96.96% / 91.69% Mesh, 3D scanner

Best-in-class 2D deep learning models (DE-KAN, FUSegNet variants, OralBBNet) approach ≥98% accuracy and ≥97% Dice; 3D transformer-based methods (TSegFormer, TFormer) achieve ≥97.8% overall pointwise accuracy with mIoU in the mid-90% range. On more challenging boundary and instance-level metrics, recent instance-segmentation aware designs (BATISNet, TSegFormer, CrossTooth) show gains of 2–7 percentage points over prior semantic baselines with improved separation in adverse scenarios (e.g., malposed or missing teeth).

5. Critical Factors Impacting High TSA

The primary determinants of elevated TSA include:

6. Limitations and Future Directions

Despite significant advances, several challenges persist:

  • Boundary degradation under label noise or for highly worn, supernumerary, or missing teeth remains a major source of error (Zhang et al., 2024, Xi et al., 31 Mar 2025).
  • Generalization to atypical anatomies (edentulous jaws, complex prosthetics) requires further expansion of training sets and semantically-aware post-processing (Xi et al., 31 Mar 2025).
  • Computational cost: High-accuracy models (DE-KAN, TSegFormer, BATISNet) may require greater memory and inference time; efficiency improvements and knowledge distillation are noted areas for future research (Mustakim et al., 23 Nov 2025).
  • Weak or unsupervised learning: Reducing annotation burden and addressing out-of-distribution generalization can further broaden clinical applicability (Zhang et al., 2024, Kunzo et al., 2023).
  • Clinical metrics: Continued refinement of composite TSA scores combining shape, position, orientation, boundary, and overlap metrics is advocated for translational robustness and explainability (Zhu et al., 2024).

7. Research Directions and Best Practices

  • Hybrid multi-objective loss and deep supervision across decoder stages can consistently raise TSA (Zhang et al., 2024, Ghafoor et al., 2023).
  • Boundary-aware training and augmentation, such as focal losses on high-curvature points, yield sharper boundaries and improve segmentation where it is most clinically relevant (Cai et al., 30 Dec 2025, Xi et al., 31 Mar 2025, Xiong et al., 2023).
  • Model selection and validation should leverage multiple TSA metrics (Dice, IoU, boundary scores, instance-level F1, and domain-specific attributes) to avoid overfitting to a single index (Ben-Hamadou et al., 2023, Zhu et al., 2024).
  • Human-in-the-loop retraining and visual analytics frameworks (e.g., ViSTooth’s glyph + scatterplot) accelerate the discovery and remediation of rare segmentation errors, supporting continual improvement (Zhu et al., 2024).

Teeth Segmentation Accuracy thus encapsulates a suite of metrics and practices. Continued progress depends on not just absolute scores, but a nuanced, region- and instance-aware assessment of segmentation reliability—especially at clinically critical boundaries. This evolving landscape mandates coordinated improvement in modeling, loss design, data curation, and metric selection, as exemplified by the leading approaches in 2D/3D imaging and international benchmark challenges.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Teeth Segmentation Accuracy (TSA).