Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 87 tok/s

Gemini 2.5 Pro 56 tok/s Pro

GPT-5 Medium 16 tok/s Pro

GPT-5 High 18 tok/s Pro

GPT-4o 98 tok/s Pro

Kimi K2 210 tok/s Pro

GPT OSS 120B 451 tok/s Pro

Claude Sonnet 4 39 tok/s Pro

2000 character limit reached

CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs (2505.10496v2)

Published 15 May 2025 in cs.CV

Abstract: We introduce CheXGenBench, a rigorous and multifaceted evaluation framework for synthetic chest radiograph generation that simultaneously assesses fidelity, privacy risks, and clinical utility across state-of-the-art text-to-image generative models. Despite rapid advancements in generative AI for real-world imagery, medical domain evaluations have been hindered by methodological inconsistencies, outdated architectural comparisons, and disconnected assessment criteria that rarely address the practical clinical value of synthetic samples. CheXGenBench overcomes these limitations through standardised data partitioning and a unified evaluation protocol comprising over 20 quantitative metrics that systematically analyse generation quality, potential privacy vulnerabilities, and downstream clinical applicability across 11 leading text-to-image architectures. Our results reveal critical inefficiencies in the existing evaluation protocols, particularly in assessing generative fidelity, leading to inconsistent and uninformative comparisons. Our framework establishes a standardised benchmark for the medical AI community, enabling objective and reproducible comparisons while facilitating seamless integration of both existing and future generative models. Additionally, we release a high-quality, synthetic dataset, SynthCheX-75K, comprising 75K radiographs generated by the top-performing model (Sana 0.6B) in our benchmark to support further research in this critical domain. Through CheXGenBench, we establish a new state-of-the-art and release our framework, models, and SynthCheX-75K dataset at https://raman1121.github.io/CheXGenBench/

Summary

CheXGenBench: A Unified Benchmark For Fidelity, Privacy, and Utility of Synthetic Chest Radiographs

CheXGenBench presents a groundbreaking approach towards the standardized evaluation of synthetic chest radiograph generations, addressing crucial aspects of fidelity, privacy, and clinical utility. This benchmark is designed to overcome existing methodological inconsistencies in medical image generation evaluation by providing a rigorous framework based on over 20 quantitative metrics. It systematically assesses the quality, privacy vulnerabilities, and downstream applicability of generated images across 11 leading text-to-image architectures.

Key Contributions

This paper highlights several critical inefficiencies in existing protocols. The inadequacies in evaluating generative fidelity lead to inconsistent comparisons. CheXGenBench aims to establish standardized benchmarks through unified protocols that allow reproducible comparisons. Furthermore, the authors release a high-quality synthetic dataset, SynthCheX-75K, comprising 75,000 radiographs generated by the Sana 0.6B model, which was identified as the top-performing model within their framework.

Methodological Innovations

CheXGenBench evaluates models across three primary dimensions: generative fidelity and mode coverage, privacy and re-identification risk, and synthetic data utility. The authors propose RadDino for evaluating Fréchet Inception Distance (FID), which improves upon traditional approaches by offering more nuanced insights into image-text alignment, density, and mode coverage. Additionally, they introduce privacy metrics using a deep learning-based re-identification score, and latent- and pixel-space distances, providing a comprehensive analysis of privacy risks.

Numerical Results

The Sana model achieves the lowest FID score and demonstrates strong capabilities in image-text alignment and coverage across diverse data distributions. In privacy assessments, there is significant variance in re-identification risks across models. Sana also exhibits promising results for synthetic data utility, matching or outperforming real data in classification tasks. The correlation between fidelity and classification performance confirms the potential of high-fidelity synthetic images in enhancing downstream applications.

Implications for Future Development

The results suggest that while synthetic images can significantly contribute to alleviating data scarcity in medical domains, challenges remain, particularly in privacy and downstream utility. The evidence that higher fidelity correlates with better classification suggests future research should focus on generative model architectures that enhance visual realism while maintaining privacy safeguards.

Conclusion

CheXGenBench lays the foundation for standardized evaluations in the medical image generation community. The methodology and findings from this paper provide substantial insights into the performance and limitations of current models, facilitating further advancements in this domain. The release of SynthCheX-75K supports ongoing research and development, ensuring that key challenges in fidelity, privacy, and utility are addressed.