Advancing Open-ended Visual Quality Comparison with Co-Instruct and the MICBench Benchmark
Introduction to Co-Instruct
The objective of visual quality assessment (IQA) is pivotal within the domain of visual computing, playing a crucial role in guiding content improvement and curating high-quality content recommendations. Traditional approaches often rely on subjective pairwise comparisons which, although effective, face scalability issues with an increased number of comparisons. Recognizing these challenges, this paper introduces Co-Instruct, a model fine-tuned on a novel dataset, Co-Instruct-562K, designed specifically for open-ended visual quality comparison. Co-Instruct extends the capabilities of large multi-modality models (LMMs) in providing not just comparisons but well-reasoned responses to a broader range of quality-related queries across multiple images.
The Co-Instruct-562K Dataset and Model Training
The Co-Instruct-562K dataset is an amalgamation of data collected from two innovative strategies: Merge2Compare and Teach2Compare. Merge2Compare leverages human annotations on single images, which are then combined through LLM guidance to generate comparative data without explicit pairwise annotations. On the other hand, Teach2Compare utilizes responses from GPT-4V to expand the dataset with pseudo-labeled comparisons and question-answer pairs for training, focusing on general comparisons and improving model capabilities in answering open-ended questions regarding image quality.
This approach offers a significant boost to the model's performance, with Co-Instruct demonstrating a 30% better accuracy in visual comparisons than any existing open-source LMM and even surpassing its teacher, GPT-4V, on various metrics.
MICBench: A New Benchmark for Multi-Image Comparison
Alongside Co-Instruct, this paper introduces MICBench, a pioneering benchmark specifically designed to assess multi-image quality comparisons. Covering 2,000 multi-choice questions (MCQs) spanning different question types and focusing on comparisons among groups of three or four images, MICBench aims to fill the gap in evaluation settings for multi-image comparison, providing a comprehensive tool for future research in IQA.
Empirical Evaluations and Findings
The evaluation of Co-Instruct across multiple benchmarks, including the newly proposed MICBench, demonstrates its superior performance. Noteworthy findings include:
- Co-Instruct achieves higher accuracy than state-of-the-art LMMs across all benchmarks, with significant improvements noted in the context of detailed reasoning skills and the ability to handle open-range questions effectively.
- When compared against established benchmarks for quality comparison, Co-Instruct not only stands out among open-source models but also challenges proprietary models, showcasing advancements in the field of LMM-based image quality assessment.
- The paper also substantiates the advantage of adopting a specialized training approach through the Co-Instruct-562K dataset and highlights the effectiveness of the proposed image-text interleaved input structure for improving model performance in multi-image scenarios.
Towards Future Developments in Visual Quality Comparison
This research marks a significant step towards enhancing the capabilities of LMMs in the domain of IQA, particularly in open-ended visual quality comparison scenarios. By successfully training a model to surpass human-level performance in related tasks and establishing a dedicated benchmark for evaluating such models, this paper lays the groundwork for further exploration and innovation in the field.
The introduction of Co-Instruct and MICBench addresses fundamental challenges in visual quality assessment, offering new avenues for research and application. As the demand for sophisticated image analysis tools continues to grow, developments like these are pivotal in advancing our understanding and capabilities in assessing visual content quality at scale. It is anticipated that future work will continue to expand upon these initial achievements, exploring new methodologies, data augmentation strategies, and evaluation frameworks to further refine and enhance the accuracy and applicability of LMMs in the field of IQA and beyond.