Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Open-ended Visual Quality Comparison (2402.16641v2)

Published 26 Feb 2024 in cs.CV
Towards Open-ended Visual Quality Comparison

Abstract: Comparative settings (e.g. pairwise choice, listwise ranking) have been adopted by a wide range of subjective studies for image quality assessment (IQA), as it inherently standardizes the evaluation criteria across different observers and offer more clear-cut responses. In this work, we extend the edge of emerging large multi-modality models (LMMs) to further advance visual quality comparison into open-ended settings, that 1) can respond to open-range questions on quality comparison; 2) can provide detailed reasonings beyond direct answers. To this end, we propose the Co-Instruct. To train this first-of-its-kind open-source open-ended visual quality comparer, we collect the Co-Instruct-562K dataset, from two sources: (a) LLM-merged single image quality description, (b) GPT-4V "teacher" responses on unlabeled data. Furthermore, to better evaluate this setting, we propose the MICBench, the first benchmark on multi-image comparison for LMMs. We demonstrate that Co-Instruct not only achieves in average 30% higher accuracy than state-of-the-art open-source LMMs, but also outperforms GPT-4V (its teacher), on both existing related benchmarks and the proposed MICBench. Our model is published at https://huggingface.co/q-future/co-instruct.

Advancing Open-ended Visual Quality Comparison with Co-Instruct and the MICBench Benchmark

Introduction to Co-Instruct

The objective of visual quality assessment (IQA) is pivotal within the domain of visual computing, playing a crucial role in guiding content improvement and curating high-quality content recommendations. Traditional approaches often rely on subjective pairwise comparisons which, although effective, face scalability issues with an increased number of comparisons. Recognizing these challenges, this paper introduces Co-Instruct, a model fine-tuned on a novel dataset, Co-Instruct-562K, designed specifically for open-ended visual quality comparison. Co-Instruct extends the capabilities of large multi-modality models (LMMs) in providing not just comparisons but well-reasoned responses to a broader range of quality-related queries across multiple images.

The Co-Instruct-562K Dataset and Model Training

The Co-Instruct-562K dataset is an amalgamation of data collected from two innovative strategies: Merge2Compare and Teach2Compare. Merge2Compare leverages human annotations on single images, which are then combined through LLM guidance to generate comparative data without explicit pairwise annotations. On the other hand, Teach2Compare utilizes responses from GPT-4V to expand the dataset with pseudo-labeled comparisons and question-answer pairs for training, focusing on general comparisons and improving model capabilities in answering open-ended questions regarding image quality.

This approach offers a significant boost to the model's performance, with Co-Instruct demonstrating a 30% better accuracy in visual comparisons than any existing open-source LMM and even surpassing its teacher, GPT-4V, on various metrics.

MICBench: A New Benchmark for Multi-Image Comparison

Alongside Co-Instruct, this paper introduces MICBench, a pioneering benchmark specifically designed to assess multi-image quality comparisons. Covering 2,000 multi-choice questions (MCQs) spanning different question types and focusing on comparisons among groups of three or four images, MICBench aims to fill the gap in evaluation settings for multi-image comparison, providing a comprehensive tool for future research in IQA.

Empirical Evaluations and Findings

The evaluation of Co-Instruct across multiple benchmarks, including the newly proposed MICBench, demonstrates its superior performance. Noteworthy findings include:

  • Co-Instruct achieves higher accuracy than state-of-the-art LMMs across all benchmarks, with significant improvements noted in the context of detailed reasoning skills and the ability to handle open-range questions effectively.
  • When compared against established benchmarks for quality comparison, Co-Instruct not only stands out among open-source models but also challenges proprietary models, showcasing advancements in the field of LMM-based image quality assessment.
  • The paper also substantiates the advantage of adopting a specialized training approach through the Co-Instruct-562K dataset and highlights the effectiveness of the proposed image-text interleaved input structure for improving model performance in multi-image scenarios.

Towards Future Developments in Visual Quality Comparison

This research marks a significant step towards enhancing the capabilities of LMMs in the domain of IQA, particularly in open-ended visual quality comparison scenarios. By successfully training a model to surpass human-level performance in related tasks and establishing a dedicated benchmark for evaluating such models, this paper lays the groundwork for further exploration and innovation in the field.

The introduction of Co-Instruct and MICBench addresses fundamental challenges in visual quality assessment, offering new avenues for research and application. As the demand for sophisticated image analysis tools continues to grow, developments like these are pivotal in advancing our understanding and capabilities in assessing visual content quality at scale. It is anticipated that future work will continue to expand upon these initial achievements, exploring new methodologies, data augmentation strategies, and evaluation frameworks to further refine and enhance the accuracy and applicability of LMMs in the field of IQA and beyond.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (14)
  1. Haoning Wu (68 papers)
  2. Hanwei Zhu (18 papers)
  3. Zicheng Zhang (124 papers)
  4. Erli Zhang (11 papers)
  5. Chaofeng Chen (41 papers)
  6. Liang Liao (36 papers)
  7. Chunyi Li (66 papers)
  8. Annan Wang (12 papers)
  9. Wenxiu Sun (59 papers)
  10. Qiong Yan (39 papers)
  11. Xiaohong Liu (117 papers)
  12. Guangtao Zhai (230 papers)
  13. Shiqi Wang (162 papers)
  14. Weisi Lin (118 papers)
Citations (31)