Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Silkie: Preference Distillation for Large Visual Language Models (2312.10665v1)

Published 17 Dec 2023 in cs.CV and cs.CL

Abstract: This paper explores preference distillation for large vision LLMs (LVLMs), improving their ability to generate helpful and faithful responses anchoring the visual context. We first build a vision-language feedback (VLFeedback) dataset utilizing AI annotation. Specifically, responses are generated by models sampled from 12 LVLMs, conditioned on multi-modal instructions sourced from various datasets. We adopt GPT-4V to assess the generated outputs regarding helpfulness, visual faithfulness, and ethical considerations. Furthermore, the preference supervision is distilled into Qwen-VL-Chat through the direct preference optimization (DPO) method. The resulting model Silkie, achieves 6.9% and 9.5% relative improvement on the MME benchmark regarding the perception and cognition capabilities, respectively. Silkie also demonstrates reduced hallucination by setting a new state-of-the-art score of 3.02 on the MMHal-Bench benchmark. Further analysis shows that DPO with our VLFeedback dataset mainly boosts the fine-grained perception and complex cognition abilities of LVLMs, leading to more comprehensive improvements compared to human-annotated preference datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Lei Li (1293 papers)
  2. Zhihui Xie (17 papers)
  3. Mukai Li (17 papers)
  4. Shunian Chen (15 papers)
  5. Peiyi Wang (48 papers)
  6. Liang Chen (360 papers)
  7. Yazheng Yang (16 papers)
  8. Benyou Wang (109 papers)
  9. Lingpeng Kong (134 papers)
Citations (50)
X Twitter Logo Streamline Icon: https://streamlinehq.com