IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models (2312.15663v1)
Abstract: LLMs, such as ChatGPT, have demonstrated impressive capabilities in various tasks and attracted an increasing interest as a natural language interface across many domains. Recently, large vision-LLMs (VLMs) like BLIP-2 and GPT-4 have been intensively investigated, which learn rich vision-language correlation from image-text pairs. However, despite these developments, the application of LLMs and VLMs in image quality assessment (IQA), particularly in medical imaging, remains to be explored, which is valuable for objective performance evaluation and potential supplement or even replacement of radiologists' opinions. To this end, this paper introduces IQAGPT, an innovative image quality assessment system integrating an image quality captioning VLM with ChatGPT for generating quality scores and textual reports. First, we build a CT-IQA dataset for training and evaluation, comprising 1,000 CT slices with diverse quality levels professionally annotated. To better leverage the capabilities of LLMs, we convert annotated quality scores into semantically rich text descriptions using a prompt template. Second, we fine-tune the image quality captioning VLM on the CT-IQA dataset to generate quality descriptions. The captioning model fuses the image and text features through cross-modal attention. Third, based on the quality descriptions, users can talk with ChatGPT to rate image quality scores or produce a radiological quality report. Our preliminary results demonstrate the feasibility of assessing image quality with large models. Remarkably, our IQAGPT outperforms GPT-4 and CLIP-IQA, as well as the multi-task classification and regression models that solely rely on images.
- PaLM: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Improving language understanding by generative pre-training. OpenAI blog, 2018.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Language models are few-shot learners. NeurIPS, 33:1877–1901, 2020.
- Training language models to follow instructions with human feedback. NeurIPS, 35:27730–27744, 2022.
- Deep reinforcement learning from human preferences. NeurIPS, 30, 2017.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
- Image as a foreign language: BEiT pretraining for vision and vision-language tasks. In CVPR, pages 19175–19186, 2023.
- BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
- PaLM-E: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
- Visual ChatGPT: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671, 2023.
- Self-supervised multi-modal training from uncurated image and reports enables zero-shot oversight artificial intelligence in radiology. arXiv preprint arXiv:2208.05140, 2023.
- Chuang Niu and Ge Wang. CT multi-task learning with a large image-text (LIT) model. arXiv preprint arXiv:2304.02649, 2023.
- Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: results, limitations, and potential. Vis. Comput. Ind. Biomed. Art., 6(1):9, 2023.
- OpenAI. Gpt-4 technical report, 2023.
- MiniGPT-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
- Vicuna: An open-source chatbot impressing GPT-4 with 90%* ChatGPT quality. See https://vicuna. lmsys. org (accessed 14 April 2023), 2023.
- Alexey Dosovitskiy et al. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2020.
- Survey of methods and principles in three-dimensional reconstruction from two-dimensional medical images. Vis. Comput. Ind. Biomed. Art., 6(1):15, 2023.
- Cardiac CT blooming artifacts: clinical significance, root causes and potential solutions. Vis. Comput. Ind. Biomed. Art., 5(1):1–13, 2022.
- CT image denoising and deblurring with deep learning: Current status and perspectives. IEEE Trans. Radiat. Plasma Med. Sci., 2023.
- Chuang Niu and Ge Wang. Advances in deep learning techniques for biomedical imaging. Vis. Comput. Ind. Biomed. Art., 6(1):1–2, 2023.
- Vision transformer architecture and applications in digital health: a tutorial and survey. Vis. Comput. Ind. Biomed. Art., 6(1):14, 2023.
- Low-dose CT with a residual encoder-decoder convolutional neural network. IEEE Trans. Med. Imaging, 36(12):2524–2535, 2017.
- Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss. IEEE Trans. Med. Imaging, 37(6):1348–1357, 2018.
- 3-D convolutional encoder-decoder network for low-dose CT via transfer learning from a 2-D trained network. IEEE Trans. Med. Imaging, 37(6):1522–1534, 2018.
- Deep learning tomographic reconstruction through hierarchical decomposition of domain transforms. Vis. Comput. Ind. Biomed. Art., 5(1):1–13, 2022.
- CoreDiff: Contextual error-modulated generalized diffusion model for low-dose CT denoising and generalization. IEEE Trans. Med. Imaging, 2023.
- ASCON: Anatomy-aware supervised contrastive learning framework for low-dose ct denoising. In MICCAI, pages 355–365. Springer, 2023.
- LIT-Former: Linking in-plane and through-plane transformers for simultaneous CT image denoising and deblurring. arXiv preprint arXiv:2302.10630, 2023.
- Abdominal CT: comparison of adaptive statistical iterative and filtered back projection reconstruction techniques. Radiology, 257(2):373–383, 2010.
- Image quality assessment using synthetic images. In WACV, pages 93–102, 2022.
- CONVIQT: Contrastive video quality estimator. IEEE Trans. Image Process., 2023.
- Blind CT image quality assessment via deep learning framework. In NSS/MIC, pages 1–4. IEEE, 2019.
- Exploring CLIP for assessing the look and feel of images. In AAAI, volume 37, pages 2555–2563, 2023.
- Competitive performance of a modularized deep neural network compared to commercial algorithms for low-dose CT image reconstruction. Nat. Mach. Intell., 1(6):269–276, 2019.
- Low-dose CT for the detection and classification of metastatic liver lesions: results of the 2016 low dose CT grand challenge. Med. Phys., 44(10):e339–e352, 2017.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL, pages 4171–4186, June 2019.
- Decoupled weight decay regularization. In ICLR, 2019.
- Priya Goyal et al. Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
- SGDR: Stochastic gradient descent with warm restarts. In ICLR, 2017.
- BLEU: a method for automatic evaluation of machine translation. In ACL, pages 311–318, 2002.
- Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summ. Branch. Out, pages 74–81, 2004.
- METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In ACL Workshop, 2005.
- CIDEr: Consensus-based image description evaluation. In CVPR, pages 4566–4575, 2015.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. J. Mach. Learn. Res., 9(11), 2008.
- Zhihao Chen (65 papers)
- Bin Hu (217 papers)
- Chuang Niu (42 papers)
- Tao Chen (397 papers)
- Yuxin Li (36 papers)
- Hongming Shan (90 papers)
- Ge Wang (214 papers)