Papers
Topics
Authors
Recent
2000 character limit reached

Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback (2506.12323v2)

Published 14 Jun 2025 in cs.CV

Abstract: Paucity of medical data severely limits the generalizability of diagnostic ML models, as the full spectrum of disease variability can not be represented by a small clinical dataset. To address this, diffusion models (DMs) have been considered as a promising avenue for synthetic image generation and augmentation. However, they frequently produce medically inaccurate images, deteriorating the model performance. Expert domain knowledge is critical for synthesizing images that correctly encode clinical information, especially when data is scarce and quality outweighs quantity. Existing approaches for incorporating human feedback, such as reinforcement learning (RL) and Direct Preference Optimization (DPO), rely on robust reward functions or demand labor-intensive expert evaluations. Recent progress in Multimodal LLMs (MLLMs) reveals their strong visual reasoning capabilities, making them adept candidates as evaluators. In this work, we propose a novel framework, coined MAGIC (Medically Accurate Generation of Images through AI-Expert Collaboration), that synthesizes clinically accurate skin disease images for data augmentation. Our method creatively translates expert-defined criteria into actionable feedback for image synthesis of DMs, significantly improving clinical accuracy while reducing the direct human workload. Experiments demonstrate that our method greatly improves the clinical quality of synthesized skin disease images, with outputs aligning with dermatologist assessments. Additionally, augmenting training data with these synthesized images improves diagnostic accuracy by +9.02% on a challenging 20-condition skin disease classification task, and by +13.89% in the few-shot setting.

Summary

  • The paper introduces MAGIC, a framework that leverages AI-expert feedback to generate medically accurate skin disease images using diffusion models.
  • It employs reward-based fine-tuning and direct preference optimization to iteratively align generated images with clinical criteria.
  • The synthetic dataset produced enhances diagnostic model performance by up to 13% in few-shot classification scenarios.

"Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback" (2506.12323)

Introduction

The study "Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback" addresses a critical challenge in developing accurate ML models for dermatological diagnoses: data scarcity. The lack of diverse medical images hinders the generalizability of these models. The authors propose a novel framework, MAGIC (Medically Accurate Generation of Images through AI-Expert Collaboration), to synthesize clinically accurate skin disease images. Utilizing multimodal LLMs (MLLMs) as evaluators, MAGIC aims to incorporate expert feedback effectively while minimizing direct human involvement. This is achieved through a collaborative AI-expert paradigm, directing diffusion models (DMs) to produce high-fidelity medical images for model training, enhancing diagnostic accuracy substantially.

Methodology

Diffusion Models and Feedback Mechanism

Diffusion models are employed for text-to-image generation, adapting them to synthesize synthetic datasets that capture intricate disease features. MAGIC translates expert-defined criteria into feedback for these models using a structured checklist guided by MLLMs. The framework utilizes both reward-based fine-tuning (RFT) and direct preference optimization (DPO) to align the DMs with clinical expectations. Figure 1

Figure 1: Illustration of MAGIC, including preliminary fine-tuning of DMs, expert checklist-based feedback evaluation using MLLMs, and feedback-enhanced DMs for training.

Multi-Step MDP Formulation

The framework formulates the denoising process as a multi-step Markov Decision Process (MDP), allowing DMs to developmentally align generated images with clinical criteria iteratively. This process is directed by MLLM-evaluated feedback rather than direct human annotation, reducing labor while maintaining high medical standards. Figure 2

Figure 2: GPT-4o uses condition-specific checklists to assess generated images, focusing on clinical criteria to identify preferred samples.

Experimental Results

Image Generation and Evaluation

The results show significant advancements in the quality of synthetic dermatological images generated by MAGIC. Using the framework, diagnostic models were enhanced by over 9% in classification accuracy on a challenging 20-condition task. These improvements were even more pronounced in few-shot scenarios, demonstrating over 13% enhancement in model performance. Figure 3

Figure 3: Evolution of synthetic skin conditions through MAGIC-DPO, illustrating the learning of unique features across iterations.

Quantitative Assessment

Quantitative comparisons reveal that MAGIC consistently outperforms baseline methods, achieving lower Fréchet Inception Distance (FID) scores and higher satisfaction rates for clinical criteria. The figures demonstrate how the continuous feedback process allows the models to refine output quality iteratively. Figure 4

Figure 4

Figure 4

Figure 4

Figure 4: Comparative analysis showing feedback volume's effect on accuracy, FID scores, and criteria satisfaction, endorsing MAGIC's efficacy.

Implications and Future Directions

Practical Applications

The MAGIC framework presents a scalable approach for augmenting dermatology datasets, particularly beneficial in teledermatology for rural areas where data accessibility is limited. The ability to generate high-quality synthetic data can vastly improve diagnosis and treatment outcomes without compromising patient privacy.

Broader Impact and Potential Expansions

Future developments could explore the integration of MAGIC in other medical imaging domains suffering similar data constraints. The AI-expert collaboration model efficiently leverages evolving AI capabilities, and further advancements in MLLMs could propel these applications into more intricate and nuanced medical fields.

Conclusion

MAGIC represents a significant step forward in using AI to overcome the limitations imposed by medical image scarcity. By merging expert-driven directives with sophisticated AI evaluative technologies, the framework synthesizes medically accurate images that support effective diagnostic training. The findings highlight the method's superiority in enhancing model accuracy and reliability, setting a strong precedent for future expansions in AI-assisted medical imaging.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.