Q-Boost: On Visual Quality Assessment Ability of Low-level Multi-Modality Foundation Models (2312.15300v1)
Abstract: Recent advancements in Multi-modality LLMs (MLLMs) have demonstrated remarkable capabilities in complex high-level vision tasks. However, the exploration of MLLM potential in visual quality assessment, a vital aspect of low-level vision, remains limited. To address this gap, we introduce Q-Boost, a novel strategy designed to enhance low-level MLLMs in image quality assessment (IQA) and video quality assessment (VQA) tasks, which is structured around two pivotal components: 1) Triadic-Tone Integration: Ordinary prompt design simply oscillates between the binary extremes of $positive$ and $negative$. Q-Boost innovates by incorporating a `middle ground' approach through $neutral$ prompts, allowing for a more balanced and detailed assessment. 2) Multi-Prompt Ensemble: Multiple quality-centric prompts are used to mitigate bias and acquire more accurate evaluation. The experimental results show that the low-level MLLMs exhibit outstanding zeros-shot performance on the IQA/VQA tasks equipped with the Q-Boost strategy.
- Zicheng Zhang (124 papers)
- Haoning Wu (68 papers)
- Zhongpeng Ji (6 papers)
- Chunyi Li (66 papers)
- Erli Zhang (11 papers)
- Wei Sun (373 papers)
- Xiaohong Liu (117 papers)
- Xiongkuo Min (138 papers)
- Fengyu Sun (15 papers)
- Shangling Jui (36 papers)
- Weisi Lin (118 papers)
- Guangtao Zhai (230 papers)