JavisInst-OMNI: Malocclusion Diagnosis Data
- JavisInst-OMNI is a multi-view dataset of 4,166 intra-oral images from 384 patients, providing high-quality, real-world data for malocclusion diagnosis.
- The dataset features rigorous annotation of ten clinically relevant malocclusion categories using a standardized protocol aligned with ABO guidelines.
- Benchmark results across CNN, Transformer, and graph-based models demonstrate its ability to drive advancements in automated orthodontic image analysis.
The JavisInst-OMNI (OMNI) dataset is a publicly released multi-view RGB image collection specifically curated to advance automated malocclusion diagnosis in orthodontics. Encompassing 4,166 images from 384 participants across five intra-oral views, and annotated in ten clinically relevant malocclusion categories by dental professionals, OMNI constitutes the first large-scale, high-quality dataset of its kind for oral and maxillofacial imaging. Its design, annotation methodology, and extensive benchmark baselines provide a robust foundation for research in dental image analysis (Xue et al., 21 May 2025).
1. Dataset Composition and Acquisition Protocol
Participants and Imaging:
OMNI comprises images from 384 Chinese patients (153 male, 231 female; ages 3–48, years, years), captured at the Department of Stomatology, Third Affiliated Hospital of Soochow University. All participants provided informed consent under the Declaration of Helsinki. The acquisition protocol required routine intra-oral cleaning, standardized retractors, and a Canon EOS 550D digital camera (manual exposure, flash at ¼-stop, natural light curing lamp), with the camera orthogonally aligned at to the dental surface for each view.
View Distribution:
- Frontal occlusal: 903 images
- Left buccal occlusal: 841 images
- Right buccal occlusal: 843 images
- Maxillary (upper arch) occlusal: 820 images
- Mandibular (lower arch) occlusal: 759 images
The protocol involved specific positioning and combinations of lip hooks, intra-oral mirrors, and patient head tilts to ensure comprehensive visual coverage of the dental arches.
2. Annotation Framework
Pre-processing and Quality Assurance:
All images underwent a pre-selection phase, with exclusion criteria targeting motion blur and incomplete dental structures (e.g., occlusion gaps). Teeth localization was conducted using the Makesense.ai platform via dense bounding boxes on each visible tooth.
Diagnostic Categories:
Annotations followed the 7th edition of “Orthodontics” and ABO guidelines, assigning one or more of the following ten labels per image:
- HT: Healthy Teeth
- TT: Tooth Torsion
- DO: Deep Overjet
- IOA: Invisible Orthodontic Attachment
- TE: Tooth Emergence
- CFOA: Cast Fixed Orthodontic Appliance
- TM: Tooth Misalignment
- MR: Mandibular Retrusion
- OB: Orthodontic Brace
- FOD: Fixed Orthodontic Device
Review Process:
Initial annotation was by trained dentists. A two-stage audit, performed by senior orthodontists per a standardized checklist (ABO classification), enforced consistency. Discrepancies were adjudicated by a dental specialist panel. Final annotations adopted the COCO format. No formal inter-annotator agreement metric was reported; multi-stage review replaced statistical agreement measures.
3. Dataset Organization and Statistics
Data Splits:
Images are partitioned as follows:
| Split | Frontal | Left | Right | Maxillary | Mandibular | Total |
|---|---|---|---|---|---|---|
| Training | 534 | 497 | 503 | 492 | 455 | 2,481 |
| Validation | 187 | 174 | 174 | 167 | 155 | 857 |
| Test | 182 | 170 | 166 | 161 | 149 | 828 |
Class Distribution:
Among 4,166 images:
| Label | Count |
|---|---|
| HT | 3,610 |
| TT | 1,686 |
| DO | 205 |
| IOA | 402 |
| TE | 441 |
| CFOA | 144 |
| TM | 147 |
| MR | 289 |
| OB | 220 |
| FOD | 776 |
On average, each image presents 1.03 diagnostic issues. Of the images, 565 are healthy, 2,945 contain one issue, 603 depict two, and 53 display three simultaneous issues.
4. Baseline Architectures and Experimental Protocol
Model Selection and Implementation:
Six object detection baselines were established, spanning CNN, Transformer, and GNN/GNN-hybrid models. All were implemented in PyTorch/MMDetection, trained on NVIDIA RTX 4090 GPUs, with AdamW optimizer (learning rate , weight decay ), 50 training epochs for most models (exceptions noted below), standardized image pre-processing (resize, random horizontal flip , ImageNet normalization):
- CNN-Based:
- Faster R-CNN (ResNet-50/101 backbone)
- Mask R-CNN (ResNet-50/101, segmentation head disabled)
- EfficientDet (EfficientNet-B0/B3 backbone, BiFPN)
- Transformer-Based:
- DETR (ResNet-50 backbone; 6 encoder and 6 decoder layers; trained 300 epochs)
- Deformable DETR (ResNet-50, deformable attention; 50 epochs)
- Graph-Based:
- GraphTeethNet: Faster R-CNN backbone up to ROIAlign, up to 50 tooth proposals per image, ROI pooled features as node features; edge features modeled by Maxillofacial Teeth Representation Modeling (MTRM) and Teeth Relationship Modeling (TRM) modules through cross-attention; output graph classified in a GNN, trained 100 epochs.
5. Benchmarking Results and Evaluation Metrics
Performance was assessed using the mean Average Precision (mAP) metric, averaged across standard IoU thresholds , with common special cases at [email protected] and [email protected]. Definitions provided:
where and are the precision and recall at the th confidence threshold and categories.
Key results ([email protected], in %):
| Model | [email protected] |
|---|---|
| Deformable DETR | 66.39 |
| Mask R-CNN | 65.88 |
| Faster R-CNN | 65.32 |
| GraphTeethNet | 63.89 |
| EfficientDet (B3) | 64.16 |
| DETR | 63.39 |
Deformable DETR demonstrated superior performance, especially for Mandibular Retrusion, Fixed Orthodontic Device, and Healthy Teeth (MR: 93.64, FOD: 93.60, HT: 89.94). EfficientDet yielded the highest mAP at high IoU ([email protected] = 41.13). GraphTeethNet substantiated the importance of relational modeling among teeth, with a mAP gain of 1.51 over its variant without edge modeling.
Metrics such as accuracy, precision, recall, and were not reported due to the task's object-detection rather than image-level classification setup.
6. Accessibility, Data Provenance, and Use
The entire OMNI dataset (frequently referenced as JavisInst-OMNI in code) and all benchmark implementations, including images, COCO-format annotations, training scripts, Dockerfile, and environment.yml (for PyTorch 1.x and MMDetection 2.x reproducibility), are publicly available at https://github.com/RoundFaceJ/OMNI (Xue et al., 21 May 2025).
Open access facilitates reproducibility, secondary analyses, and benchmarking studies, furthering methodological advances in ML-based dental diagnostics. The role of professional annotation, standardized review, and clinically grounded labels underscores the dataset's research reliability for downstream automation in orthodontic assessment.
7. Significance and Research Implications
The OMNI dataset addresses a critical shortage of large-scale, well-annotated datasets for malocclusion assessment, historically a limiting factor in dental image analysis and the development of automated diagnostic tools. The multi-view, multi-diagnosis character, strict acquisition protocol, and expert-driven annotation pipeline collectively position OMNI as a key benchmark for evaluating object detection, segmentation, and relational reasoning algorithms in dental imaging.
A plausible implication is that OMNI may accelerate research into fine-grained intra-oral disease recognition, cross-view aggregation, structured prediction, and robust clinical deployment of machine learning models in orthodontics, setting a new standard for community benchmarks in this domain (Xue et al., 21 May 2025).