Papers
Topics
Authors
Recent
Search
2000 character limit reached

Towards Open-ended Visual Quality Comparison

Published 26 Feb 2024 in cs.CV | (2402.16641v2)

Abstract: Comparative settings (e.g. pairwise choice, listwise ranking) have been adopted by a wide range of subjective studies for image quality assessment (IQA), as it inherently standardizes the evaluation criteria across different observers and offer more clear-cut responses. In this work, we extend the edge of emerging large multi-modality models (LMMs) to further advance visual quality comparison into open-ended settings, that 1) can respond to open-range questions on quality comparison; 2) can provide detailed reasonings beyond direct answers. To this end, we propose the Co-Instruct. To train this first-of-its-kind open-source open-ended visual quality comparer, we collect the Co-Instruct-562K dataset, from two sources: (a) LLM-merged single image quality description, (b) GPT-4V "teacher" responses on unlabeled data. Furthermore, to better evaluate this setting, we propose the MICBench, the first benchmark on multi-image comparison for LMMs. We demonstrate that Co-Instruct not only achieves in average 30% higher accuracy than state-of-the-art open-source LMMs, but also outperforms GPT-4V (its teacher), on both existing related benchmarks and the proposed MICBench. Our model is published at https://huggingface.co/q-future/co-instruct.

Citations (31)

Summary

  • The paper demonstrates how Co-Instruct, fine-tuned on the Co-Instruct-562K dataset, achieves 30% higher accuracy in open-ended visual quality comparisons than state-of-the-art models.
  • It introduces MICBench, a dedicated benchmark with 2,000 MCQs designed to evaluate multi-image quality comparisons in diverse scenarios.
  • The study employs innovative strategies like Merge2Compare and Teach2Compare to generate rich image quality data, advancing the capabilities of large multi-modality models.

Advancing Open-ended Visual Quality Comparison with Co-Instruct and the MICBench Benchmark

Introduction to Co-Instruct

The objective of visual quality assessment (IQA) is pivotal within the domain of visual computing, playing a crucial role in guiding content improvement and curating high-quality content recommendations. Traditional approaches often rely on subjective pairwise comparisons which, although effective, face scalability issues with an increased number of comparisons. Recognizing these challenges, this study introduces Co-Instruct, a model fine-tuned on a novel dataset, Co-Instruct-562K, designed specifically for open-ended visual quality comparison. Co-Instruct extends the capabilities of large multi-modality models (LMMs) in providing not just comparisons but well-reasoned responses to a broader range of quality-related queries across multiple images.

The Co-Instruct-562K Dataset and Model Training

The Co-Instruct-562K dataset is an amalgamation of data collected from two innovative strategies: Merge2Compare and Teach2Compare. Merge2Compare leverages human annotations on single images, which are then combined through LLM guidance to generate comparative data without explicit pairwise annotations. On the other hand, Teach2Compare utilizes responses from GPT-4V to expand the dataset with pseudo-labeled comparisons and question-answer pairs for training, focusing on general comparisons and improving model capabilities in answering open-ended questions regarding image quality.

This approach offers a significant boost to the model's performance, with Co-Instruct demonstrating a 30% better accuracy in visual comparisons than any existing open-source LMM and even surpassing its teacher, GPT-4V, on various metrics.

MICBench: A New Benchmark for Multi-Image Comparison

Alongside Co-Instruct, this study introduces MICBench, a pioneering benchmark specifically designed to assess multi-image quality comparisons. Covering 2,000 multi-choice questions (MCQs) spanning different question types and focusing on comparisons among groups of three or four images, MICBench aims to fill the gap in evaluation settings for multi-image comparison, providing a comprehensive tool for future research in IQA.

Empirical Evaluations and Findings

The evaluation of Co-Instruct across multiple benchmarks, including the newly proposed MICBench, demonstrates its superior performance. Noteworthy findings include:

  • Co-Instruct achieves higher accuracy than state-of-the-art LMMs across all benchmarks, with significant improvements noted in the context of detailed reasoning skills and the ability to handle open-range questions effectively.
  • When compared against established benchmarks for quality comparison, Co-Instruct not only stands out among open-source models but also challenges proprietary models, showcasing advancements in the field of LMM-based image quality assessment.
  • The study also substantiates the advantage of adopting a specialized training approach through the Co-Instruct-562K dataset and highlights the effectiveness of the proposed image-text interleaved input structure for improving model performance in multi-image scenarios.

Towards Future Developments in Visual Quality Comparison

This research marks a significant step towards enhancing the capabilities of LMMs in the domain of IQA, particularly in open-ended visual quality comparison scenarios. By successfully training a model to surpass human-level performance in related tasks and establishing a dedicated benchmark for evaluating such models, this study lays the groundwork for further exploration and innovation in the field.

The introduction of Co-Instruct and MICBench addresses fundamental challenges in visual quality assessment, offering new avenues for research and application. As the demand for sophisticated image analysis tools continues to grow, developments like these are pivotal in advancing our understanding and capabilities in assessing visual content quality at scale. It is anticipated that future work will continue to expand upon these initial achievements, exploring new methodologies, data augmentation strategies, and evaluation frameworks to further refine and enhance the accuracy and applicability of LMMs in the field of IQA and beyond.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 8 likes about this paper.