Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 87 tok/s
Gemini 2.5 Pro 56 tok/s Pro
GPT-5 Medium 16 tok/s Pro
GPT-5 High 18 tok/s Pro
GPT-4o 98 tok/s Pro
Kimi K2 210 tok/s Pro
GPT OSS 120B 451 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs (2505.10496v2)

Published 15 May 2025 in cs.CV

Abstract: We introduce CheXGenBench, a rigorous and multifaceted evaluation framework for synthetic chest radiograph generation that simultaneously assesses fidelity, privacy risks, and clinical utility across state-of-the-art text-to-image generative models. Despite rapid advancements in generative AI for real-world imagery, medical domain evaluations have been hindered by methodological inconsistencies, outdated architectural comparisons, and disconnected assessment criteria that rarely address the practical clinical value of synthetic samples. CheXGenBench overcomes these limitations through standardised data partitioning and a unified evaluation protocol comprising over 20 quantitative metrics that systematically analyse generation quality, potential privacy vulnerabilities, and downstream clinical applicability across 11 leading text-to-image architectures. Our results reveal critical inefficiencies in the existing evaluation protocols, particularly in assessing generative fidelity, leading to inconsistent and uninformative comparisons. Our framework establishes a standardised benchmark for the medical AI community, enabling objective and reproducible comparisons while facilitating seamless integration of both existing and future generative models. Additionally, we release a high-quality, synthetic dataset, SynthCheX-75K, comprising 75K radiographs generated by the top-performing model (Sana 0.6B) in our benchmark to support further research in this critical domain. Through CheXGenBench, we establish a new state-of-the-art and release our framework, models, and SynthCheX-75K dataset at https://raman1121.github.io/CheXGenBench/

Summary

CheXGenBench: A Unified Benchmark For Fidelity, Privacy, and Utility of Synthetic Chest Radiographs

CheXGenBench presents a groundbreaking approach towards the standardized evaluation of synthetic chest radiograph generations, addressing crucial aspects of fidelity, privacy, and clinical utility. This benchmark is designed to overcome existing methodological inconsistencies in medical image generation evaluation by providing a rigorous framework based on over 20 quantitative metrics. It systematically assesses the quality, privacy vulnerabilities, and downstream applicability of generated images across 11 leading text-to-image architectures.

Key Contributions

This paper highlights several critical inefficiencies in existing protocols. The inadequacies in evaluating generative fidelity lead to inconsistent comparisons. CheXGenBench aims to establish standardized benchmarks through unified protocols that allow reproducible comparisons. Furthermore, the authors release a high-quality synthetic dataset, SynthCheX-75K, comprising 75,000 radiographs generated by the Sana 0.6B model, which was identified as the top-performing model within their framework.

Methodological Innovations

CheXGenBench evaluates models across three primary dimensions: generative fidelity and mode coverage, privacy and re-identification risk, and synthetic data utility. The authors propose RadDino for evaluating Fréchet Inception Distance (FID), which improves upon traditional approaches by offering more nuanced insights into image-text alignment, density, and mode coverage. Additionally, they introduce privacy metrics using a deep learning-based re-identification score, and latent- and pixel-space distances, providing a comprehensive analysis of privacy risks.

Numerical Results

The Sana model achieves the lowest FID score and demonstrates strong capabilities in image-text alignment and coverage across diverse data distributions. In privacy assessments, there is significant variance in re-identification risks across models. Sana also exhibits promising results for synthetic data utility, matching or outperforming real data in classification tasks. The correlation between fidelity and classification performance confirms the potential of high-fidelity synthetic images in enhancing downstream applications.

Implications for Future Development

The results suggest that while synthetic images can significantly contribute to alleviating data scarcity in medical domains, challenges remain, particularly in privacy and downstream utility. The evidence that higher fidelity correlates with better classification suggests future research should focus on generative model architectures that enhance visual realism while maintaining privacy safeguards.

Conclusion

CheXGenBench lays the foundation for standardized evaluations in the medical image generation community. The methodology and findings from this paper provide substantial insights into the performance and limitations of current models, facilitating further advancements in this domain. The release of SynthCheX-75K supports ongoing research and development, ensuring that key challenges in fidelity, privacy, and utility are addressed.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 posts and received 17 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube