Atypical Mitotic Figures in Cancer Prognosis

Updated 1 September 2025

Atypical Mitotic Figures (AMFs) are abnormal cell division events marked by polar asymmetry, chromosomal segregation defects, and irregular nuclear morphology, indicating underlying genomic instability.
Annotation protocols use expert consensus on datasets like AMi-Br to address inter-observer variability and class imbalance challenges in identifying AMFs.
Advanced computational methodologies, including deep learning with transfer learning and LoRA adaptations, enhance the detection and cross-domain robustness of AMF classification.

Atypical Mitotic Figures (AMFs) are histopathological entities defined by morphological deviations from the canonical stages of cell division. They are recognized for their association with genetic instability, tumor progression, and poor clinical prognosis, notably in cancers such as breast carcinoma. The identification and classification of AMFs—distinguished by abnormalities such as polar asymmetry, chromosomal segregation defects, and irregular nuclear morphology—form a critical aspect of tumor grading and prognostication. The complexity of AMF assessment arises from their low prevalence, high morphological variability, and substantial inter-observer disagreement among pathologists, leading to unique computational, diagnostic, and data annotation challenges.

1. Morphological Definition and Clinical Significance

AMFs are subclassified based on specific abnormalities observed during mitosis. Key morphological categories include bipolar, tri- and multipolar asymmetry, lagging or bridging chromosomes, and dispersed chromosomal fragments. Their detection indicates the presence of mutations in genes regulating the cell cycle, leading to chromosomal aneuploidy and altered tumor phenotypes (Bertram et al., 8 Jan 2025). Quantitative studies have identified the proportion of AMFs among all mitotic figures as an independent prognostic criterion in breast cancer, with increased AMF counts correlating with poorer outcomes. The biological underpinning of this correlation is the link between atypical mitosis and genomic instability.

Consensus in annotation is achieved through expert review, often using three- or five-expert majority voting. In prominent datasets such as AMi-Br, approximately 22.4% of mitotic figures are labeled as atypical. Inter-rater agreement remains limited (~78% for AMi-Br, ~70% for AtNorM-MD), reflecting the subjective difficulty of the task (Banerjee et al., 26 Jun 2025).

2. Annotation Practices and Datasets

Dataset construction for AMF recognition involves the collation of large-scale, multi-institutional image data and rigorous expert annotation. The AMi-Br dataset combines mitosis patches from two major sources (TUPAC16 and MIDOG21), each annotated via majority voting by three pathologists to distinguish normal and atypical morphologies (Bertram et al., 8 Jan 2025). For further benchmarking, two hold-out datasets have been introduced: AtNorM-Br (from the TCGA cohort, single expert) and AtNorM-MD (multi-domain, five-expert consensus), each designed to probe generalization under domain shifts and variable expert agreement (Banerjee et al., 26 Jun 2025).

Annotation protocols emphasize precise morphological criteria and majority voting to mitigate label noise and subjectivity. Data splits include patch-level (potentially overoptimistic) and patient-level (more stringent), with balanced accuracy and ROC AUC as standard performance metrics.

Dataset	Total MF	AMF Count	% AMF	Expert Consensus
AMi-Br	3720	832	22.4%	3-expert majority
AtNorM-Br	746	128	17.2%	Single expert
AtNorM-MD	2107	219	10.4%	5-expert majority

3. Computational Methodologies for AMF Classification

A diverse array of machine learning and deep learning strategies has been developed for AMF classification. Baseline approaches leverage convolutional neural networks such as DenseNet-121 and EfficientNet V2-S, employing weighted cross-entropy, focal loss, and prevalence-weighted sampling to address severe class imbalance and morphological overlap (Bertram et al., 8 Jan 2025).

Recent advances have incorporated transfer learning and fine-tuning of vision foundation models. For example, Virchow2 and DINOv3-H+ models, pre-trained on natural images or billions of histopathology tiles, have shown robust cross-domain accuracy when adapted using parameter-efficient methods such as Low-Rank Adaptation (LoRA), inserting learnable low-rank matrices into transformer attention layers (Banerjee et al., 26 Jun 2025, Balezo et al., 28 Aug 2025). Balanced accuracy values reach up to 0.8135 (in-domain), 0.8871 (cross-domain MIDOG 2025 test set), and AUROC up to 0.9026, highlighting competitive performance even in the presence of class imbalance and domain shifts.

MixStyle and CBAM-based feature attention refine feature diversity and alignment (Atey et al., 28 Aug 2025), while multi-task networks utilizing auxiliary segmentation branches improve object focus and robustness under domain shift (Percannella et al., 28 Aug 2025). Ensemble strategies and center cropping have been shown to enhance detection of morphological cues critical for AMF identification (Yamagishi et al., 26 Aug 2025, Krauss et al., 28 Aug 2025). Rule-based refinement modules can increase specificity, though often at the expense of sensitivity (Krauss et al., 28 Aug 2025).

Method/Model	Balanced Accuracy	AUROC	Main Approximation
Virchow2 (LoRA)	0.8135	0.9026	Foundation Model
DINOv3-H+ (LoRA)	0.8871	—	Vision Transformer
ConvNeXt V2 (Ensemble)	0.8831	0.9533	Modern CNN
MixStyle+CBAM+EMA	0.8762	0.9499	Domain Robust Hybrid
Multi-task ResNet-50	0.833	—	Multi-task Learning

4. Domain Shift, Data Diversity, and Robustness

Domain heterogeneity—variations in staining, scanner technology, species, and tumor type—poses major obstacles to generalization. Robust training-time recipes have emerged to enhance domain-invariance via: (i) style perturbation (MixStyle), (ii) feature alignment using weak metadata (Scanner, Origin, Species, Tumor), and (iii) knowledge distillation from an EMA teacher (Atey et al., 28 Aug 2025). This approach yields balanced accuracy above 0.87 and AUROC near 0.95 on multi-domain test sets.

Multi-task learning, where the classification task is coupled with auxiliary segmentation, further stabilizes performance against background variation. Leave-one-domain-out validation strategies and ensemble inference mitigate the risk of overfitting to specific domains (Percannella et al., 28 Aug 2025). Advanced data augmentation—color jitter, multi-standard stain normalization, D4 symmetry, coarse dropout—also plays a pivotal role in model robustness, especially for transformer-based approaches (Balezo et al., 28 Aug 2025).

5. Key Challenges: Inter-Observer Variability and Class Imbalance

AMFs are characterized by pronounced variability and rare occurrence. Studies report inter-expert agreement for AMF annotation as low as ~70%, with significant disagreement between atypical and certain normal mitosis subtypes (Aubreville et al., 2022, Banerjee et al., 26 Jun 2025). The resultant label noise and confirmation bias directly affect classifier performance, necessitating consensus annotation protocols and weighting schemes in loss functions, for instance:

$\mathcal{L} = -\sum_{i=1}^{C} w_i\, y_i \log(p_i)$

with $C$ classes and $w_i$ reflecting the inverse prevalence.

Class imbalance is addressed via loss weighting, oversampling, and balanced accuracy metrics. In patient-level evaluation, typical baseline architectures achieve balanced accuracy in the 0.61–0.71 range, but advanced pretrained and ensemble methods routinely exceed 0.81 on multi-domain test sets.

6. Prognostic Implications, Diagnostic Utility, and Future Directions

AMF quantification is an emerging biomarker for tumor aggressiveness, reflecting cell cycle gene mutations and genomic instability (Bertram et al., 8 Jan 2025, Aubreville et al., 2022). Automated, reproducible detection using deep learning models reduces observer variability and supports more objective cancer grading. Integration of AMF annotation into routine diagnostic workflows promises improvements in prognostic accuracy and treatment personalization.

Ongoing research is directed toward improved annotation frameworks, more extensive and diverse datasets, parameter-efficient adaptation (LoRA), self-supervised and domain-adaptive learning, and interpretability (e.g., diffusion models for visualizing transitional features) (Bahadir et al., 2023). Synthetic data generation and probabilistic labeling may further enrich training for rare and ambiguous cases. The release of datasets such as AMi-Br and MIDOG, alongside comprehensive evaluation benchmarks, sets the stage for reproducible, robust computational pathology.

7. Methodological Innovations and Data Availability

Technical outputs—such as feature-refined representations, gated hierarchical classifiers, and advanced augmentation pipelines—drive improvements in both sensitivity and specificity for AMF detection. Performance is assessed via balanced accuracy, AUROC, F1 scores, and domain-specific stratification. The community benefits from open-source codebases and data repositories (e.g., https://github.com/DeepMicroscopy/AMi-Br_Benchmark (Banerjee et al., 26 Jun 2025)), fostering collaboration and further advances.

In summary, AMF identification and classification represent a frontier in computational pathology, characterized by significant challenges in morphological variability, annotation reproducibility, and robust domain adaptation. Recent advances in dataset curation, algorithmic design, and benchmarking have propelled the field toward highly accurate, reproducible, and clinically relevant assessments of atypical mitotic figures.