Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fake News Detection and Manipulation Reasoning via Large Vision-Language Models (2407.02042v1)

Published 2 Jul 2024 in cs.CL and cs.AI

Abstract: Fake news becomes a growing threat to information security and public opinion with the rapid sprawl of media manipulation. Therefore, fake news detection attracts widespread attention from academic community. Traditional fake news detection models demonstrate remarkable performance on authenticity binary classification but their ability to reason detailed faked traces based on the news content remains under-explored. Furthermore, due to the lack of external knowledge, the performance of existing methods on fact-related news is questionable, leaving their practical implementation unclear. In this paper, we propose a new multi-media research topic, namely manipulation reasoning. Manipulation reasoning aims to reason manipulations based on news content. To support the research, we introduce a benchmark for fake news detection and manipulation reasoning, referred to as Human-centric and Fact-related Fake News (HFFN). The benchmark highlights the centrality of human and the high factual relevance, with detailed manual annotations. HFFN encompasses four realistic domains with fake news samples generated through three manipulation approaches. Moreover, a Multi-modal news Detection and Reasoning LLM (M-DRUM) is presented not only to judge on the authenticity of multi-modal news, but also raise analytical reasoning about potential manipulations. On the feature extraction level, a cross-attention mechanism is employed to extract fine-grained fusion features from multi-modal inputs. On the reasoning level, a large vision-LLM (LVLM) serves as the backbone to facilitate fact-related reasoning. A two-stage training framework is deployed to better activate the capacity of identification and reasoning. Comprehensive experiments demonstrate that our model outperforms state-of-the-art (SOTA) fake news detection models and powerful LVLMs like GPT-4 and LLaVA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774 (2023).
  2. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  3. Cross-modal ambiguity learning for multimodal fake news detection. In Proceedings of the ACM Web Conference 2022. 2897–2905.
  4. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https://lmsys.org/blog/2023-03-30-vicuna/
  5. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).
  6. Machine-generated text: A comprehensive survey of threat models and detection models. IEEE Access (2023).
  7. Template-based named entity recognition using BART. arXiv preprint arXiv:2106.01760 (2021).
  8. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  9. Information bottleneck disentanglement for identity swapping. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3404–3413.
  10. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15180–15190.
  11. Anomalygpt: Detecting industrial anomalies using large vision-language models. arXiv preprint arXiv:2308.15366 (2023).
  12. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  13. Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection. arXiv preprint arXiv: 2309.12247 (2023).
  14. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).
  15. Soroush Abbasi Koohpayegani and Hamed Pirsiavash. 2024. Sima: Simple softmax-free attention for vision transformers. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2607–2617.
  16. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023).
  17. Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744 (2023).
  18. Visual instruction tuning. arXiv preprint arXiv:2304.08485 (2023).
  19. Describe what to change: A text-guided unsupervised image-to-image translation approach. In Proceedings of the 28th ACM International Conference on Multimedia. 1357–1365.
  20. Divide-and-Conquer: Post-User Interaction Network for Fake News Detection on Social Media. In Proceedings of the ACM Web Conference 2022. 1148–1158.
  21. Text-adaptive generative adversarial networks: manipulating images with natural language. Advances in neural information processing systems 31 (2018).
  22. FANG: Leveraging Social Context for Fake News Detection Using Graph Representation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM ’20). Association for Computing Machinery, New York, NY, USA, 1165–1174. https://doi.org/10.1145/3340531.3412046
  23. OpenAI. 2022. ChatGPT: Optimizing Language Models for Dialogue. OpenAI Blog (2022). https://openai.com/blog/chatgpt
  24. Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2085–2094.
  25. Piotr Przybyla. 2020. Capturing the style of fake news. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 490–497.
  26. Improving fake news detection by using an entity-enhanced framework to fuse diverse multimodal clues. In Proceedings of the 29th ACM International Conference on Multimedia. 1212–1220.
  27. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  28. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
  29. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 658–666.
  30. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 658–666.
  31. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2287–2296.
  32. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF international conference on computer vision. 1–11.
  33. Detecting and grounding multi-modal media manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6904–6913.
  34. Fadi Abu Sheikha and Diana Inkpen. 2011. Generation of formal and informal sentences. In Proceedings of the 13th European Workshop on Natural Language Generation. 187–193.
  35. Pandagpt: One model to instruction-follow them all. arXiv preprint arXiv:2305.16355 (2023).
  36. Transforming delete, retrieve, generate approach for controlled text style transfer. In EMNLP-IJCNLP.
  37. Designing an encoder for stylegan image manipulation. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–14.
  38. High-fidelity gan inversion for image attribute editing. In Proceedings of the IEEE/CVF Cenference on Computer Vision and Pattern Recognition. 11379–11388.
  39. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
  40. Multimodal fusion with co-attention networks for fake news detection. In Findings of the association for computational linguistics: ACL-IJCNLP 2021. 2560–2569.
  41. Multi-modal Fake News Detection on Social Media via Multi-grained Information Fusion. In Proceedings of the 2023 ACM International Conference on Multimedia Retrieval. 343–352.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ruihan Jin (6 papers)
  2. Ruibo Fu (54 papers)
  3. Zhengqi Wen (69 papers)
  4. Shuai Zhang (319 papers)
  5. Yukun Liu (45 papers)
  6. Jianhua Tao (139 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.