- The paper demonstrates that incorporating synthetic images via DALL·E 2 improves classification accuracy by 30.5% for underrepresented skin types.
- The method combines real and synthetic image data, leveraging VGG16 models and the Fitzpatrick 17k dataset to address bias across skin tones.
- Results suggest that strategically generated synthetic images can effectively mitigate data bias and enhance diagnostic performance in dermatology.
Improving Dermatology Classifiers Using Synthetic Images
Introduction
The development and deployment of dermatological classification algorithms have gained momentum with the rise of machine learning, achieving parity with dermatologists in many diagnostic tasks. However, these models often struggle with generalization across diverse skin tones due to biases inherent in the training datasets, which frequently under-represent darker skin tones. This discrepancy presents significant limitations in clinical applicability. Utilizing synthetic data generated by large diffusion models presents a promising method to address these biases. This paper leverages DALL·E 2 to generate synthetic images to enhance model performance on the Fitzpatrick 17k dataset, particularly targeting improvements for underrepresented skin types.
Methods
The study utilizes the Fitzpatrick 17k dataset as its primary source of labeled dermatological images, which includes skin tone annotations to gauge model biases. Seven skin conditions were chosen for analysis based on factors such as sample size at FST extremes and baseline model accuracy.
To supplement the dataset, synthetic images were generated using DALL·E 2. The generation process began by selecting seed images from the extreme ends of the FST scale and applying the DALL·E 2 inpainting function with specific text prompts to produce augmented datasets (Figure 1).
Figure 1: A schematic overview of the study utilizing synthetic data generation.
Model training involved VGG16 architectures, pre-trained on ImageNet, with separate experiments conducted for models trained on light FST images tested on dark FST images and vice versa. Evaluation focused on how synthetic data inclusion impacts model accuracy across the FST spectrum.
Results
The incorporation of synthetic images resulted in notable performance improvements across several skin conditions. Classifiers trained on datasets augmented with both seed and synthetic images exhibited heightened accuracy compared to those trained on Fitzpatrick-only datasets. Specifically, the largest gains were seen in conditions where there was a significant FST mismatch between the training and testing datasets.
For instance, in squamous cell carcinoma, models trained on FST V-VI supplemented with synthetic images showed a 30.5% improvement in accuracy when tested on FST I-II compared to models trained without synthetic data (Figure 2).
Figure 2: Model accuracy improvements across real and synthetic training datasets.
This trend indicates that synthetic data particularly enhances performance for extreme skin types, thus addressing one of the core limitations of traditional datasets.
Discussion
The augmentation of dermatological datasets with photorealistic synthetic images has demonstrated potential in mitigating biases associated with under-represented skin tones in traditional datasets. The observed dose-response relationship suggests strategic synthetic data generation can effectively bolster classifier performance. Yet, this augmentation process is not without challenges, particularly regarding data leakage risks and the intensive computational resources required.
While synthetic data serves as a valuable tool to supplement existing datasets, the importance of collecting diverse, real-world data remains paramount. Future research should concentrate on streamlining synthetic image generation processes and validating these methods across diverse demographic groups and skin conditions. The potential of fully automated pipelines for creating and vetting large-scale synthetic datasets could further revolutionize the field.
Conclusion
The study underscores the viability of synthetically generated data in improving the robustness and accuracy of dermatological classifiers. By harnessing the capabilities of diffusion models like DALL·E 2, we can create more inclusive and effective diagnostic tools, ultimately advancing equity in healthcare provided by AI-driven technologies.