Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics (2309.07120v1)

Published 13 Sep 2023 in cs.CL, cs.AI, cs.CV, cs.CY, and cs.LG

Abstract: Multi-modal LLMs (MLLMs) are trained based on LLMs (LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual responses. While they excel in multi-modal tasks, the pure NLP abilities of MLLMs are often underestimated and left untested. In this study, we get out of the box and unveil an intriguing characteristic of MLLMs -- our preliminary results suggest that visual instruction tuning, a prevailing strategy for transitioning LLMs into MLLMs, unexpectedly and interestingly helps models attain both improved truthfulness and ethical alignment in the pure NLP context. For example, a visual-instruction-tuned LLaMA2 7B model surpasses the performance of the LLaMA2-chat 7B model, fine-tuned with over one million human annotations, on TruthfulQA-mc and Ethics benchmarks. Further analysis reveals that the improved alignment can be attributed to the superior instruction quality inherent to visual-text data. In releasing our code at github.com/UCSC-VLAA/Sight-Beyond-Text, we aspire to foster further exploration into the intrinsic value of visual-text synergies and, in a broader scope, multi-modal interactions in alignment research.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (4)

Haoqin Tu (25 papers)
Bingchen Zhao (46 papers)
Chen Wei (72 papers)
Cihang Xie (91 papers)

Citations (12)

View on Semantic Scholar

GitHub

GitHub - UCSC-VLAA/Sight-Beyond-Text: This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics" (19 stars)

Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics (2309.07120v1)

Related Papers

GitHub