- The paper introduces MAGIC, a framework that leverages AI-expert feedback to generate medically accurate skin disease images using diffusion models.
- It employs reward-based fine-tuning and direct preference optimization to iteratively align generated images with clinical criteria.
- The synthetic dataset produced enhances diagnostic model performance by up to 13% in few-shot classification scenarios.
"Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback" (2506.12323)
Introduction
The study "Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback" addresses a critical challenge in developing accurate ML models for dermatological diagnoses: data scarcity. The lack of diverse medical images hinders the generalizability of these models. The authors propose a novel framework, MAGIC (Medically Accurate Generation of Images through AI-Expert Collaboration), to synthesize clinically accurate skin disease images. Utilizing multimodal LLMs (MLLMs) as evaluators, MAGIC aims to incorporate expert feedback effectively while minimizing direct human involvement. This is achieved through a collaborative AI-expert paradigm, directing diffusion models (DMs) to produce high-fidelity medical images for model training, enhancing diagnostic accuracy substantially.
Methodology
Diffusion Models and Feedback Mechanism
Diffusion models are employed for text-to-image generation, adapting them to synthesize synthetic datasets that capture intricate disease features. MAGIC translates expert-defined criteria into feedback for these models using a structured checklist guided by MLLMs. The framework utilizes both reward-based fine-tuning (RFT) and direct preference optimization (DPO) to align the DMs with clinical expectations.
Figure 1: Illustration of MAGIC, including preliminary fine-tuning of DMs, expert checklist-based feedback evaluation using MLLMs, and feedback-enhanced DMs for training.
The framework formulates the denoising process as a multi-step Markov Decision Process (MDP), allowing DMs to developmentally align generated images with clinical criteria iteratively. This process is directed by MLLM-evaluated feedback rather than direct human annotation, reducing labor while maintaining high medical standards.
Figure 2: GPT-4o uses condition-specific checklists to assess generated images, focusing on clinical criteria to identify preferred samples.
Experimental Results
Image Generation and Evaluation
The results show significant advancements in the quality of synthetic dermatological images generated by MAGIC. Using the framework, diagnostic models were enhanced by over 9% in classification accuracy on a challenging 20-condition task. These improvements were even more pronounced in few-shot scenarios, demonstrating over 13% enhancement in model performance.
Figure 3: Evolution of synthetic skin conditions through MAGIC-DPO, illustrating the learning of unique features across iterations.
Quantitative Assessment
Quantitative comparisons reveal that MAGIC consistently outperforms baseline methods, achieving lower Fréchet Inception Distance (FID) scores and higher satisfaction rates for clinical criteria. The figures demonstrate how the continuous feedback process allows the models to refine output quality iteratively.



Figure 4: Comparative analysis showing feedback volume's effect on accuracy, FID scores, and criteria satisfaction, endorsing MAGIC's efficacy.
Implications and Future Directions
Practical Applications
The MAGIC framework presents a scalable approach for augmenting dermatology datasets, particularly beneficial in teledermatology for rural areas where data accessibility is limited. The ability to generate high-quality synthetic data can vastly improve diagnosis and treatment outcomes without compromising patient privacy.
Broader Impact and Potential Expansions
Future developments could explore the integration of MAGIC in other medical imaging domains suffering similar data constraints. The AI-expert collaboration model efficiently leverages evolving AI capabilities, and further advancements in MLLMs could propel these applications into more intricate and nuanced medical fields.
Conclusion
MAGIC represents a significant step forward in using AI to overcome the limitations imposed by medical image scarcity. By merging expert-driven directives with sophisticated AI evaluative technologies, the framework synthesizes medically accurate images that support effective diagnostic training. The findings highlight the method's superiority in enhancing model accuracy and reliability, setting a strong precedent for future expansions in AI-assisted medical imaging.