Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare (2405.19298v1)
Abstract: While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce Compare2Score-an all-around LMM-based no-reference IQA (NR-IQA) model, which is capable of producing qualitatively comparative responses and effectively translating these discrete comparative levels into a continuous quality score. Specifically, during training, we present to generate scaled-up comparative instructions by comparing images from the same IQA dataset, allowing for more flexible integration of diverse IQA datasets. Utilizing the established large-scale training corpus, we develop a human-like visual quality comparator. During inference, moving beyond binary choices, we propose a soft comparison method that calculates the likelihood of the test image being preferred over multiple predefined anchor images. The quality score is further optimized by maximum a posteriori estimation with the resulting probability matrix. Extensive experiments on nine IQA datasets validate that the Compare2Score effectively bridges text-defined comparative levels during training with converted single image quality score for inference, surpassing state-of-the-art IQA models across diverse scenarios. Moreover, we verify that the probability-matrix-based inference conversion not only improves the rating accuracy of Compare2Score but also zero-shot general-purpose LMMs, suggesting its intrinsic effectiveness.
- Modern Image Quality Assessment. Morgan & Claypool, 2006.
- Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters, 20(3):209–212, Mar. 2013.
- A feature-enriched completely blind image quality evaluator. IEEE Transactions on Image Processing, 24(8):2579–2591, Aug. 2015.
- Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Transactions on Circuits and Systems for Video Technology, 30(1):36–47, Jan. 2020.
- No-reference image quality assessment by hallucinating pristine features. IEEE Transactions on Image Processing, 31:6139–6151, 2022.
- From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3572–3582, 2020.
- KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment. IEEE Transactions on Image Processing, 29:4041–4056, Jan. 2020.
- Blindly assess image quality in the wild guided by a self-adaptive hyper network. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3667–3676, 2020.
- Exploring CLIP for assessing the look and feel of images. CoRR, abs/2207.12396, 2022.
- Blind image quality assessment via vision-language correspondence: A multitask learning perspective. In IEEE Conference on Computer Vision and Pattern Recognition, pages 14071–14081, 2023.
- No-reference image quality assessment via transformers, relative ranking, and self-consistency. In IEEE Winter Conference on Applications of Computer Vision, pages 1220–1230, 2022.
- Q-Align: Teaching LMMs for visual scoring via discrete text-defined levels. CoRR, abs/2312.17090, 2023.
- MetaIQA: Deep meta-learning for no-reference image quality assessment. In IEEE Conference on Computer Vision and Pattern Recognition, pages 14131–14140, 2020.
- No-reference screen content image quality assessment with unsupervised domain adaptation. IEEE Transactions on Image Processing, 30:5463–5476, 2021.
- Test time adaptation for blind image quality assessment. In IEEE International Conference on Computer Vision, pages 16742–16751, 2023.
- Deep blind image quality assessment powered by online hard example mining. IEEE Transactions on Multimedia, to appear 2023.
- Uncertainty-aware blind image quality assessment in the laboratory and wild. IEEE Transactions on Image Processing, 30:3474–3486, Mar. 2021.
- OpenAI. GPT-4V(ision) system card. https://cdn.openai.com/papers/GPTV_System_Card.pdf/, 2023.
- Hugging Face. Introducing IDEFICS: An open reproduction of state-of-the-art visual language model. https://huggingface.co/blog/idefics/, 2023.
- mPLUG-Owl2: Revolutionizing multi-modal large language model with modality collaboration. CoRR, abs/2311.04257, 2023.
- Improved baselines with visual instruction tuning. CoRR, abs/2310.03744, 2023.
- InternLM-XComposer2: Mastering free-form text-image composition and comprehension in vision-language large model. CoRR, abs/2401.16420, 2024.
- Visual instruction tuning. Advances in Neural Information Processing Systems, pages 34892–34916, 2024.
- Most apparent distortion: Full-reference image quality assessment and the role of strategy. Journal of Electronic Imaging, 19(1):1–21, Jan. 2010.
- No-reference blur assessment of digital pictures based on multifeature classifiers. IEEE Transactions on Image Processing, 20(1):64–75, Jan. 2011.
- KADID-10k: A large-scale artificially distorted IQA database. In International Conference on Quality of Multimedia Experience, pages 1–3, 2019.
- Comparison of four subjective methods for image quality assessment. Computer Graphics Forum, 31(8):2478–2491, 2012.
- No-reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing, 21(12):4695–4708, Dec. 2012.
- Blind image quality assessment based on high order statistics aggregation. IEEE Transactions on Image Processing, 25(9):4444–4457, Sep. 2016.
- Blind image quality assessment using local consistency aware retriever and uncertainty aware evaluator. IEEE Transactions on Circuits and Systems for Video Technology, 28(9):2078–2089, 2017.
- Using free energy principle for blind image quality assessment. IEEE Transactions on Multimedia, 17(1):50–63, Jan. 2015.
- Image quality assessment using contrastive learning. IEEE Transactions on Image Processing, 31:4149–4161, Jun. 2022.
- TOPIQ: A top-down approach from semantics to distortions for image quality assessment. IEEE Transactions on Image Processing, to appear 2024.
- End-to-end blind image quality assessment using deep neural networks. IEEE Transactions on Image Processing, 27(3):1202–1213, 2017.
- B. T. ITU-R. Methodology for the subjective assessment of the quality of television pictures. https://www.itu.int/rec/R-REC-BT.500, 2002.
- Learning to rank for blind image quality assessment. IEEE Transactions on Neural Networks and Learning Systems, 26(10):2275–2290, Oct. 2015.
- RankIQA: Learning from rankings for no-reference image quality assessment. In IEEE International Conference on Computer Vision, pages 1040–1049, 2017.
- dipIQ: Blind image quality assessment by learning-to-rank discriminable image pairs. IEEE Transactions on Image Processing, 26(8):3951–3964, Aug. 2017.
- Q-Instruct: Improving low-level visual abilities for multi-modality foundation models. CoRR, abs/2311.06783, 2023.
- 2AFC prompting of large multimodal models for image quality assessment. CoRR, abs/2402.01162, 2024.
- A benchmark for multi-modal foundation models on low-level vision: from single images to pairs. CoRR, abs/2402.07116, 2024.
- A comprehensive study of multimodal large language models for image quality assessment. CoRR, abs/2403.10854, 2024.
- Q-Boost: On visual quality assessment ability of low-level multi-modality foundation models. CoRR, abs/2312.15300, 2023.
- VisualCritic: Making LMMs perceive visual quality like humans. CoRR, abs/2403.12806, 2024.
- Depicting beyond scores: Advancing image quality assessment through multi-modal language models. CoRR, abs/2312.08962, 2023.
- Towards open-ended visual quality comparison. CoRR, abs/2402.16641, 2024.
- Q-Bench: A benchmark for general-purpose foundation models on low-level vision. CoRR, abs/2309.14181, 2023.
- Quantifying visual image quality: A bayesian view. Annual Review of Vision Science, 7(1):437–464, 2021.
- How to analyze paired comparison data, Technical Report UWEETR-2011-0004, University of Washington, 2011.
- Sheskin David J. Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press, 2004.
- LLaMA 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023.
- Louis L. Thurstone. A law of comparative judgment. Psychological Review, 34:273–286, Jul. 1927.
- A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on Image Processing, 15(11):3440–3451, Nov. 2006.
- Massive online crowdsourced study of subjective and objective picture quality. IEEE Transactions on Image Processing, 25(1):372–387, Jan. 2016.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763, 2021.
- Language models are unsupervised multitask learners. OpenAI Blog, 1(8):9, 2019.
- VQEG. Final report from the video quality experts group on the validation of objective models of video quality assessment, 2000.
- Blind image quality assessment by learning from multiple annotators. In IEEE International Conference on Image Processing, pages 2344–2348, 2019.
- MUSIQ: Multi-scale image quality transformer. In IEEE International Conference on Computer Vision, pages 5148–5157, 2021.
- Perceptual quality assessment of smartphone photography. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3677–3686, 2020.
- AGIQA-3K: An open database for ai-generated image quality assessment. IEEE Transactions on Circuits and Systems for Video Technology, to appear 2023.
- Controversial stimuli: Pitting neural networks against each other as models of human cognition. Proceedings of the National Academy of Sciences, 117(47):29330–29337, 2020.
- Image database TID2013: Peculiarities, results and perspectives. Signal Processing: Image Communication, 30:57–77, Jan. 2015.
- YFCC100M: The new data in multimedia research. Communications of the ACM, 59(2):64–73, 2016.
- AttnGan: Fine-grained text to image generation with attentional generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1316–1324, 2018.
- Hierarchical text-conditional image generation with CLIP latents. CoRR, abs/2204.06125, 2022.
- GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. CoRR, abs/2112.10741, 2021.
- David Holz. Midjourney. url = https://www.midjourney.com/, 2023.
- High-resolution image synthesis with latent diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- Text-guided synthesis of artistic images with retrieval-augmented diffusion models. CoRR, abs/2207.13038, 2022.
- Openflamingo: An open-source framework for training large autoregressive vision-language models. CoRR, abs/2308.01390, 2023.
- OBELICS: An open web-scale filtered dataset of interleaved image-text documents. CoRR, abs/2306.16527, 2023.
- Vicuna: An open-source chatbot impressing GPT-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023), 2(3):6, 2023.
- Internlm2 technical report. CoRR, abs/2403.17297, 2024.
- Hanwei Zhu (18 papers)
- Haoning Wu (68 papers)
- Yixuan Li (183 papers)
- Zicheng Zhang (124 papers)
- Baoliang Chen (21 papers)
- Lingyu Zhu (21 papers)
- Yuming Fang (53 papers)
- Guangtao Zhai (230 papers)
- Weisi Lin (118 papers)
- Shiqi Wang (162 papers)