Multimodal Large Language Models to Support Real-World Fact-Checking (2403.03627v2)
Abstract: Multimodal LLMs (MLLMs) carry the potential to support humans in processing vast amounts of information. While MLLMs are already being used as a fact-checking tool, their abilities and limitations in this regard are understudied. Here is aim to bridge this gap. In particular, we propose a framework for systematically assessing the capacity of current multimodal models to facilitate real-world fact-checking. Our methodology is evidence-free, leveraging only these models' intrinsic knowledge and reasoning capabilities. By designing prompts that extract models' predictions, explanations, and confidence levels, we delve into research questions concerning model accuracy, robustness, and reasons for failure. We empirically find that (1) GPT-4V exhibits superior performance in identifying malicious and misleading multimodal claims, with the ability to explain the unreasonable aspects and underlying motives, and (2) existing open-source models exhibit strong biases and are highly sensitive to the prompt. Our study offers insights into combating false multimodal information and building secure, trustworthy multimodal models. To the best of our knowledge, we are the first to evaluate MLLMs for real-world fact-checking.
- Cosmos: Catching out-of-context image misuse using self-supervised learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(12):14084–14092.
- Gere: Generative evidence retrieval for fact verification. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2184–2189.
- Minigpt-v2: large language model as a unified interface for vision-language multi-task learning. ArXiv preprint, abs/2310.09478.
- Tsun-Hin Cheung and Kin-Man Lam. 2023. Factllama: Optimizing instruction-following language models with external knowledge for automated fact-checking. ArXiv preprint, abs/2309.00240.
- Instructblip: Towards general-purpose vision-language models with instruction tuning. ArXiv preprint, abs/2305.06500.
- Mme: A comprehensive evaluation benchmark for multimodal large language models. ArXiv preprint, abs/2306.13394.
- A survey of language model confidence estimation and calibration. ArXiv preprint, abs/2311.08298.
- On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 1321–1330. PMLR.
- Leveraging chatgpt for efficient fact-checking.
- Do large language models know about facts? ArXiv preprint, abs/2310.05177.
- Read it twice: Towards faithfully interpretable fact verification by revisiting evidence. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2319–2323.
- A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ArXiv preprint, abs/2311.05232.
- Fighting fake news: Image splice detection via learned self-consistency. In Proceedings of the European Conference on Computer Vision (ECCV).
- Self-checker: Plug-and-play modules for fact-checking with large language models. ArXiv preprint, abs/2305.14623.
- Evaluating object hallucination in large vision-language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 292–305, Singapore. Association for Computational Linguistics.
- Yiyi Li and Ying Xie. 2020. Is a picture worth a thousand words? an empirical study of image content and social media engagement. Journal of Marketing Research, 57(1):1–19.
- Improved baselines with visual instruction tuning. ArXiv preprint, abs/2310.03744.
- Visual instruction tuning. ArXiv preprint, abs/2304.08485.
- Mmbench: Is your multi-modal model an all-around player? ArXiv preprint, abs/2307.06281.
- NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6801–6817, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- FActScore: Fine-grained atomic evaluation of factual precision in long form text generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12076–12100, Singapore. Association for Computational Linguistics.
- Multimodal automated fact-checking: A survey. ArXiv preprint, abs/2305.13507.
- Automated fact-checking for assisting human fact-checkers. ArXiv preprint, abs/2103.07769.
- Fact-checking complex claims with program-guided reasoning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6981–7004, Toronto, Canada. Association for Computational Linguistics.
- Eyes wide shut? exploring the visual shortcomings of multimodal llms. ArXiv preprint, abs/2401.06209.
- How many unicorns are in this image? a safety evaluation benchmark for vision llms. ArXiv preprint, abs/2311.16101.
- Survey on factuality in large language models: Knowledge, retrieval and domain-specificity. ArXiv preprint, abs/2310.07521.
- Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms. ArXiv preprint, abs/2306.13063.
- Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models. ArXiv preprint, abs/2306.09265.
- End-to-end multimodal fact-checking and explanation generation: A challenging dataset and models. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, page 2733–2743, New York, NY, USA. Association for Computing Machinery.
- A survey on multimodal large language models. ArXiv preprint, abs/2306.13549.
- Video-llama: An instruction-tuned audio-visual language model for video understanding. ArXiv preprint, abs/2306.02858.
- A survey of large language models. ArXiv preprint, abs/2303.18223.
- Fact-checking meets fauxtography: Verifying claims about images. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2099–2108, Hong Kong, China. Association for Computational Linguistics.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.