Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PairAug: What Can Augmented Image-Text Pairs Do for Radiology? (2404.04960v1)

Published 7 Apr 2024 in cs.CV

Abstract: Current vision-language pre-training (VLP) methodologies predominantly depend on paired image-text datasets, a resource that is challenging to acquire in radiology due to privacy considerations and labelling complexities. Data augmentation provides a practical solution to overcome the issue of data scarcity, however, most augmentation methods exhibit a limited focus, prioritising either image or text augmentation exclusively. Acknowledging this limitation, our objective is to devise a framework capable of concurrently augmenting medical image and text data. We design a Pairwise Augmentation (PairAug) approach that contains an Inter-patient Augmentation (InterAug) branch and an Intra-patient Augmentation (IntraAug) branch. Specifically, the InterAug branch of our approach generates radiology images using synthesised yet plausible reports derived from a LLM. The generated pairs can be considered a collection of new patient cases since they are artificially created and may not exist in the original dataset. In contrast, the IntraAug branch uses newly generated reports to manipulate images. This process allows us to create new paired data for each individual with diverse medical conditions. Our extensive experiments on various downstream tasks covering medical image classification zero-shot and fine-tuning analysis demonstrate that our PairAug, concurrently expanding both image and text data, substantially outperforms image-/text-only expansion baselines and advanced medical VLP baselines. Our code is released at \url{https://github.com/YtongXie/PairAug}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Language models are few-shot learners. Adv. Neural Inform. Process. Syst., 33:1877–1901, 2020.
  2. Padchest: A large chest x-ray image dataset with multi-label annotated reports. Medical image analysis, 66:101797, 2020.
  3. Roentgen: Vision-language foundation model for chest x-ray generation. arXiv preprint arXiv:2211.12737, 2022.
  4. Multi-modal masked autoencoders for medical vision-and-language pre-training. In MICCAI, pages 679–689, 2022.
  5. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  6. Chataug: Leveraging chatgpt for text data augmentation. arXiv preprint arXiv:2302.13007, 2023.
  7. Resvit: residual vision transformers for multimodal medical image synthesis. IEEE Transactions on Medical Imaging, 41(10):2598–2614, 2022.
  8. Privacy for free: How does dataset condensation help privacy? In ICML, pages 5378–5396, 2022.
  9. An image is worth 16x16 words: Transformers for image recognition at scale. In Int. Conf. Learn. Represent., 2021.
  10. Stylegan-nada: Clip-guided domain adaptation of image generators. ACM Trans. Graph., pages 1–13, 2022.
  11. Synthetic medical images from dual generative adversarial networks. arXiv preprint arXiv:1709.01872, 2017.
  12. Masked autoencoders are scalable vision learners. In IEEE Conf. Comput. Vis. Pattern Recog., pages 16000–16009, 2022.
  13. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  14. Label-free liver tumor segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., 2023.
  15. Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition. In Int. Conf. Comput. Vis., pages 3942–3951, 2021.
  16. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI conference on artificial intelligence, pages 590–597, 2019.
  17. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, pages 203–211, 2021.
  18. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, pages 1–8, 2019.
  19. Deep learning approaches for data augmentation in medical imaging: A review. Journal of Imaging, 9(4):81, 2023.
  20. Pseudo numerical methods for diffusion models on manifolds. arXiv preprint arXiv:2202.09778, 2022.
  21. Video-p2p: Video editing with cross-attention control. arXiv preprint arXiv:2303.04761, 2023.
  22. Fixing weight decay regularization in adam. 2018.
  23. Medical image synthesis with context-aware generative adversarial networks. In MICCAI, pages 417–425. Springer, 2017.
  24. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
  25. Disease-image-specific learning for diagnosis-oriented neuroimage synthesis with incomplete multi-modality data. IEEE Trans. Pattern Anal. Mach. Intell., pages 6839–6853, 2021.
  26. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
  27. Using large text-to-image models with structured prompts for skin disease identification: A case study. arXiv preprint arXiv:2301.07178, 2023.
  28. Zero-shot text-to-image generation. In ICML, pages 8821–8831, 2021.
  29. High-resolution image synthesis with latent diffusion models. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10684–10695, 2022.
  30. Generalization of deep neural networks for chest pathology classification in x-rays using generative adversarial networks. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 990–994. IEEE, 2018.
  31. Diffuseexpand: Expanding dataset for 2d medical image segmentation using diffusion models. arXiv preprint arXiv:2304.13416, 2023.
  32. Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumonia. Radiology: Artificial Intelligence, 1(1):e180041, 2019.
  33. Does synthetic data generation of llms help clinical text mining? arXiv preprint arXiv:2303.04360, 2023.
  34. Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning. Nature Biomedical Engineering, 6(12):1399–1406, 2022.
  35. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  36. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  37. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  38. Multi-granularity cross-modal alignment for generalized medical visual representation learning. In Adv. Neural Inform. Process. Syst., 2022a.
  39. Medclip: Contrastive learning from unpaired medical images and text. arXiv preprint arXiv:2210.10163, 2022b.
  40. Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation. In MICCAI, pages 171–180. Springer, 2021.
  41. Clinical-bert: Vision-language pre-training for radiograph diagnosis and reports generation. In AAAI, pages 2982–2990, 2022.
  42. Xraygan: Consistency-preserving generation of x-ray images from radiology reports. arXiv preprint arXiv:2006.10552, 2020.
  43. Generative adversarial network in medical imaging: A review. Medical image analysis, 58:101552, 2019.
  44. Cxr-clip: Toward large scale chest x-ray language-image pre-training. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 101–111. Springer, 2023.
  45. Llm for patient-trial matching: Privacy-aware data augmentation towards better performance and generalizability. arXiv preprint arXiv:2303.16756, 2023.
  46. Data augmentation using learned transformations for one-shot medical image segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 8543–8553, 2019.
  47. Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports. Nature Machine Intelligence, pages 32–40, 2022.
  48. Advancing radiograph representation learning with masked record modeling. In Int. Conf. Learn. Represent., 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Yutong Xie (68 papers)
  2. Qi Chen (194 papers)
  3. Sinuo Wang (3 papers)
  4. Minh-Son To (20 papers)
  5. Iris Lee (1 paper)
  6. Ee Win Khoo (1 paper)
  7. Kerolos Hendy (1 paper)
  8. Daniel Koh (2 papers)
  9. Yong Xia (141 papers)
  10. Qi Wu (323 papers)
Citations (5)