Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 58 tok/s

Gemini 2.5 Pro 51 tok/s Pro

GPT-5 Medium 30 tok/s Pro

GPT-5 High 33 tok/s Pro

GPT-4o 115 tok/s Pro

Kimi K2 183 tok/s Pro

GPT OSS 120B 462 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

IQA-Select: Clustering-Driven Data Selection for IQA

Updated 7 October 2025

IQA-Select is a data selection framework that uses a three-stage clustering pipeline to maximize data diversity while reducing redundancy in explainable IQA.
The approach strategically selects representative instruction subsets, significantly lowering computational cost while maintaining or improving model performance.
Empirical results show that using only 10% of carefully curated data can outperform full-data tuning on benchmarks like Q-Bench and AesBench.

IQA-Select is a systematic data selection framework designed to improve the efficiency and effectiveness of instruction tuning for explainable image quality assessment (IQA) in multimodal LLMs (MLLMs). Rather than relying on the conventional scaling law—which asserts that model performance monotonically increases with more instruction tuning data—IQA-Select demonstrates that careful, diversity-oriented data curation can yield both superior explainability and quantitative gains using only a small fraction of the total data. The approach is motivated by the observed redundancy and even detrimental oversaturation present in large-scale explainable IQA datasets. IQA-Select introduces a clustering-driven three-stage pipeline for constructing representative, high-value instruction subsets, and it achieves state-of-the-art results with steep reductions in computational cost and data volume (Li et al., 4 Oct 2025).

1. Motivation and Theoretical Premises

The motivation for IQA-Select originates from two empirical observations in explainable IQA with modern MLLMs. First, pre-trained vision–LLMs such as InternVL3-Instruct can already achieve high performance before any instruction tuning, indicating intrinsic model capacity for quality-related reasoning. Second, full-dataset fine-tuning often leads to redundant processing, elevated computational cost, and performance degradation due to information oversaturation or overfitting. Empirical studies cited in the work show that random sub-selection—even without careful engineering—can occasionally outperform full-data fine-tuning, revealing the existence of substantial irrelevance within popular instruction datasets such as Q-Instruct (≈200k samples) and Aesexpert (≈409k samples).

IQA-Select is constructed as a solution to this problem, advocating for a high-quality "coreset" and aiming to maximize informativeness and diversity, challenging the prevailing paradigm that "more data is always better."

2. Data Redundancy and Computational Cost in Explainable IQA

Contemporary instruction tuning datasets assembled for explainable IQA tasks suffer from both data redundancy and inefficiency. Redundant data samples often lack additional value for learning and can, due to excessive repetition and homogeneity, lead to catastrophic forgetting—eroding fine-grained reasoning skills acquired during pre-training.

The computational cost for full-dataset fine-tuning is substantial, particularly with large MLLMs or resource-constrained settings. The redundancy problem means that a significant portion of the dataset can be omitted without negative impact (and in many cases with measurable performance gains), so identifying the most diverse and informative samples becomes critical.

3. Clustering-Based Data Selection Methodology

IQA-Select employs a three-stage clustering-based data selection algorithm designed to maximize the coverage of relevant instruction space while minimizing redundant training exposure. The pipeline comprises:

A. Clustering Feature Extraction

Samples are encoded into feature vectors using either model-related features or model-independent features:

Model-related features: Extracted from internal representations of the fine-tuning MLLM (e.g., last pooling, last token, or multi-layer concatenation strategies such as "LMM Feat.").
Model-independent features: Derived from pre-existing IQA models or from general-purpose visual/textual embeddings (e.g., Dino-V3 or E5 encoders).

Combining LMM features and vision–text embeddings results in highly discriminative clusters that reflect both semantic and visual diversity.

B. Cluster Quota Allocation

After clustering, each cluster receives a sample quota using one or several diversity and informativeness metrics:

Density:

$D_i = (\sum_{m \ne n, m, n \in C_i} d(m, n)) / (|C_i| (|C_i| - 1))$

where $d(m, n)$ is a Gaussian kernel distance, favoring quotas to low-density clusters (representing high internal diversity).

Instruction Relevance Score (IRS):

Quantifies the impact of the input question on the answer, based on cross-entropy loss differential.

Transferability:

Measures generalization capability between clusters using centroid similarities.

Allocation strategies such as "Transferability + Density" have been found empirically effective.

C. Cluster Sampling Strategy

Within each cluster, representative samples are selected to maximize coverage while minimizing redundancy:

Greedy Maximum Mean Discrepancy (MMD): Iteratively selects samples minimizing distributional divergence (squared MMD).
SVD-Based Sampling: Employs leverage scores from SVD of the feature matrix ( $l_i = \lVert U_k(i,:) \rVert_2^2$ ) to identify samples contributing high variance.
PCA-Based Sampling: Uses projection energies from principal component analysis to prioritize diverse samples.

The default IQA-Select implementation uses SVD-based sampling and outperforms random or uniform sampling strategies.

4. Experimental Evaluation and Performance

IQA-Select is experimentally validated on both low-level (Q-Instruct/Q-Bench) and aesthetic (AesExpert/AesBench) explainable IQA benchmarks. In all tests, fine-tuning using only 10% of instruction samples selected with IQA-Select achieves or exceeds full-data performance:

On Q-Bench, using 10% IQA-Select samples yielded a score of 79.4% versus 77.73% for full-data training (random subsampling at 10% achieved 78.13%).
For AesBench, IQA-Select at 10% delivers 103.7% of the full-data performance (61.11% vs. 58.89%) and surpasses both random and full-data results.

These improvements are accompanied by reductions in training time and GPU usage, confirming that careful data selection effectively concentrates model capacity on the most informative and diverse parts of the instruction space.

Visualization (e.g., T-SNE plots) demonstrates that IQA-Select selects a representative sample that spans both the high-density core and outlier regions of the data manifold, achieving comprehensive coverage.

5. Broader Implications and Future Research Directions

IQA-Select introduces a data-centric paradigm in explainable image quality assessment, shifting emphasis from brute-force dataset expansion to strategic selection of high-quality data points. This approach encourages the community to reconsider the value of scaling laws and to explore clustering and diversity as primary levers for performance improvement.

Given the generality of its clustering-based quota allocation and sampling framework, IQA-Select is well-positioned to be adapted to other multimodal instruction tuning tasks—such as video quality assessment or audio–visual reasoning—where data redundancy similarly hampers efficiency. Future work may extend IQA-Select by refining feature extraction or incorporating task-specific data quality metrics for even more effective coreset construction.

Further research can also focus on characterizing the optimal cluster size, the trade-off between cluster granularity and selection diversity, and on developing automated metrics for instruction informativeness.

6. Summary Table: IQA-Select Pipeline Stages

Stage	Key Operation	Technical Details / Metrics
Feature Extraction	Encode samples as features	Model-related (LMM), model-independent (DINO-V3, E5)
Cluster Quota Allocation	Assign selection quotas to clusters	Density, IRS, transferability metrics
Cluster Sampling	Select representative samples from clusters	MMD, SVD, PCA-based strategies

7. Impact on Explainable IQA and Data-Centric Research

IQA-Select establishes that model performance in explainable IQA is not monotonically tied to dataset size. By systematically quantifying and leveraging data diversity and informativeness, IQA-Select enables both improved quality of explanations and reduced computational overhead. This work is central to the ongoing shift towards data-centric AI research and provides a broadly generalizable procedure for efficient, effective instruction tuning in multimodal perceptual reasoning tasks (Li et al., 4 Oct 2025).

PDF Markdown Chat (Pro)

References (1)

Exploring Instruction Data Quality for Explainable Image Quality Assessment (2025)

Follow Topic

Get notified by email when new papers are published related to IQA-Select.