Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback (2306.12438v1)

Published 16 Jun 2023 in eess.IV, cs.CV, and cs.LG

Abstract: Generative models capable of capturing nuanced clinical features in medical images hold great promise for facilitating clinical data sharing, enhancing rare disease datasets, and efficiently synthesizing annotated medical images at scale. Despite their potential, assessing the quality of synthetic medical images remains a challenge. While modern generative models can synthesize visually-realistic medical images, the clinical validity of these images may be called into question. Domain-agnostic scores, such as FID score, precision, and recall, cannot incorporate clinical knowledge and are, therefore, not suitable for assessing clinical sensibility. Additionally, there are numerous unpredictable ways in which generative models may fail to synthesize clinically plausible images, making it challenging to anticipate potential failures and manually design scores for their detection. To address these challenges, this paper introduces a pathologist-in-the-loop framework for generating clinically-plausible synthetic medical images. Starting with a diffusion model pretrained using real images, our framework comprises three steps: (1) evaluating the generated images by expert pathologists to assess whether they satisfy clinical desiderata, (2) training a reward model that predicts the pathologist feedback on new samples, and (3) incorporating expert knowledge into the diffusion model by using the reward model to inform a finetuning objective. We show that human feedback significantly improves the quality of synthetic images in terms of fidelity, diversity, utility in downstream applications, and plausibility as evaluated by experts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  2. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  3. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  4. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  5. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
  6. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  7. Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering, 5(6):493–497, 2021.
  8. Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR medical informatics, 8(7):e18910, 2020.
  9. A survey on active learning and human-in-the-loop deep learning for medical image analysis. Medical Image Analysis, 71:102062, 2021.
  10. Deep learning for medical image processing: Overview, challenges and the future. Classification in BioApps: Automation of Decision Making, pages 323–350, 2018.
  11. Generation of anonymous chest radiographs using latent diffusion models for training thoracic abnormality classification systems. arXiv preprint arXiv:2211.01323, 2022.
  12. Spot the fake lungs: Generating synthetic medical images using neural diffusion models. In Artificial Intelligence and Cognitive Science: 30th Irish Conference, AICS 2022, Munster, Ireland, December 8–9, 2022, Revised Selected Papers, pages 32–39. Springer, 2023.
  13. Deep adversarial training for multi-organ nuclei segmentation in histopathology images. IEEE transactions on medical imaging, 39(11):3257–3267, 2019.
  14. Attention-guided generative adversarial network to address atypical anatomy in synthetic ct generation. In 2020 IEEE 21st international conference on information reuse and integration for data science (IRI), pages 188–193. IEEE, 2020.
  15. What does dall-e 2 know about radiology? Journal of Medical Internet Research, 25:e43110, 2023.
  16. Adapting pretrained vision-language foundational models to medical imaging domains. arXiv preprint arXiv:2210.04133, 2022.
  17. Laion-5b: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402, 2022.
  18. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
  19. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.
  20. Aligning text-to-image models using human feedback. arXiv preprint arXiv:2302.12192, 2023.
  21. Assessing generative models via precision and recall. Advances in neural information processing systems, 31, 2018.
  22. How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models. In International Conference on Machine Learning, pages 290–306. PMLR, 2022.
  23. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  24. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  25. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  26. Diffusion probabilistic models beat gans on medical images. arXiv preprint arXiv:2212.07501, 2022.
  27. Brain imaging generation with latent diffusion models. In Deep Generative Models: Second MICCAI Workshop, DGM4MICCAI 2022, Held in Conjunction with MICCAI 2022, Singapore, September 22, 2022, Proceedings, pages 117–126. Springer, 2022.
  28. Medical diffusion–denoising diffusion probabilistic models for 3d medical image generation. arXiv preprint arXiv:2211.03364, 2022.
  29. Diffusion models for medical image analysis: A comprehensive survey. arXiv preprint arXiv:2211.07804, 2022.
  30. Deep learning-based image quality improvement for low-dose computed tomography simulation in radiation therapy. Journal of Medical Imaging, 6(4):043504–043504, 2019.
  31. Quality assessment of compressed and resized medical images based on pattern recognition using a convolutional neural network. Communications in Nonlinear Science and Numerical Simulation, 95:105582, 2021.
  32. Breast mass classification from mammograms using deep convolutional neural networks. arXiv preprint arXiv:1612.00542, 2016.
  33. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  34. Harry Nyquist. Certain factors affecting telegraph speed. Transactions of the American Institute of Electrical Engineers, 43:412–422, 1924.
  35. The evolution of video quality measurement: From psnr to hybrid metrics. IEEE transactions on Broadcasting, 54(3):660–668, 2008.
  36. Signal, noise, and contrast in nuclear magnetic resonance (nmr) imaging. J Comput Assist Tomogr, 7(3):391–401, 1983.
  37. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  38. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
  39. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021.
  40. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  41. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022.
  42. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073, 2022.
  43. Imagereward: Learning and evaluating human preferences for text-to-image generation. arXiv preprint arXiv:2304.05977, 2023.
  44. Text-guided image-and-shape editing and generation: A short survey. arXiv preprint arXiv:2304.09244, 2023.
  45. Better aligning text-to-image models with human preference. arXiv preprint arXiv:2303.14420, 2023.
  46. Raft: Reward ranked finetuning for generative foundation model alignment. arXiv preprint arXiv:2304.06767, 2023.
  47. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1492–1500, 2017.
  48. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  49. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  50. Highly accurate differentiation of bone marrow cell morphologies using deep neural networks on a large image data set. Blood, The Journal of the American Society of Hematology, 138(20):1917–1927, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shenghuan Sun (5 papers)
  2. Gregory M. Goldgof (5 papers)
  3. Atul Butte (1 paper)
  4. Ahmed M. Alaa (48 papers)
Citations (8)