Generating realistic and diverse synthetic chromosomal abnormalities without sufficient real abnormal data

Construct generative procedures that, using only normal chromosome images, produce structurally realistic and diverse synthetic chromosomal abnormalities—such as deletions, duplications, inversions, and translocations—in the absence of sufficient real abnormal data, in order to alleviate class imbalance in structural chromosomal anomaly detection.

Background

Structural chromosomal abnormalities are rare and difficult to collect at scale, leading to severe long-tailed imbalance between normal and abnormal samples. This scarcity hinders the training of deep learning models for anomaly detection and limits the ability of standard generative approaches to learn faithful distributions of diverse anomaly types.

The paper frames the need for a simulation-driven augmentation approach by explicitly identifying the unresolved challenge of generating realistic and diverse synthetic anomalies when real abnormal data are insufficient.

References

Therefore, two key challenges remain unresolved: how to generate structurally realistic and diverse synthetic anomalies in the absence of sufficient real abnormal data, and how to implicitly assess and dynamically prioritize high-quality synthetic samples during training to maximize their utility for downstream anomaly detection.