End-to-End Multimodal Fact-Checking and Explanation Generation: A Challenging Dataset and Models (2205.12487v2)
Abstract: We propose end-to-end multimodal fact-checking and explanation generation, where the input is a claim and a large collection of web sources, including articles, images, videos, and tweets, and the goal is to assess the truthfulness of the claim by retrieving relevant evidence and predicting a truthfulness label (e.g., support, refute or not enough information), and to generate a statement to summarize and explain the reasoning and ruling process. To support this research, we construct Mocheg, a large-scale dataset consisting of 15,601 claims where each claim is annotated with a truthfulness label and a ruling statement, and 33,880 textual paragraphs and 12,112 images in total as evidence. To establish baseline performances on Mocheg, we experiment with several state-of-the-art neural architectures on the three pipelined subtasks: multimodal evidence retrieval, claim verification, and explanation generation, and demonstrate that the performance of the state-of-the-art end-to-end multimodal fact-checking does not provide satisfactory outcomes. To the best of our knowledge, we are the first to build the benchmark dataset and solutions for end-to-end multimodal fact-checking and explanation generation. The dataset, source code and model checkpoints are available at https://github.com/VT-NLP/Mocheg.
- Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14940–14949.
- Where is your evidence: improving fact-checking by justification modeling. In Proceedings of the first workshop on fact extraction and verification (FEVER). 85–90.
- Feverous: Fact extraction and verification over unstructured and structured information. arXiv preprint arXiv:2106.05707 (2021).
- Generating fact checking explanations. arXiv preprint arXiv:2004.05773 (2020).
- MultiFC: A real-world multi-domain dataset for evidence-based fact checking of claims. arXiv preprint arXiv:1909.03242 (2019).
- Ms marco: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268 (2016).
- Proppy: Organizing the news based on their propagandistic content. Information Processing & Management 56, 5 (2019), 1849–1864.
- Verifying multimedia use at mediaeval 2015. MediaEval 3, 3 (2015), 7.
- Generating Literal and Implied Subquestions to Fact-check Complex Claims. arXiv preprint arXiv:2205.06938 (2022).
- Tabfact: A large-scale dataset for table-based fact verification. arXiv preprint arXiv:1909.02164 (2019).
- Ask to know more: Generating counterfactual explanations for fake claims. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2800–2810.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, Vol. 1. Association for Computational Linguistics (ACL), 4171–4186. arXiv:1810.04805 https://github.com/tensorflow/tensor2tensor
- Team Triple-Check at Factify 2: Parameter-Efficient Large Foundation Models with Feature Representations for Multi-Modal Fact Verification. arXiv preprint arXiv:2302.07740 (2023).
- Murray Edelman and Murray Jacob Edelman Edelman. 2001. The politics of misinformation. Cambridge University Press.
- Generating fact checking briefs. arXiv preprint arXiv:2011.05448 (2020).
- Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin 76, 5 (1971), 378.
- Infosurgeon: Cross-media fine-grained information consistency checking for fake news detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1683–1698.
- Exfakt: A framework for explaining facts over knowledge graphs and text. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 87–95.
- Peter Godfrey-Smith. 1989. Misinformation. Canadian Journal of Philosophy 19, 4 (1989), 533–550.
- Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations. arXiv preprint arXiv:2301.04246 (2023).
- A Survey on Automated Fact-Checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. https://doi.org/10.1162/tacl_a_00454 arXiv:2108.11896
- ExClaim: Explainable Neural Claim Verification Using Rationalization. In 2022 IEEE 29th Annual Software Technology Conference (STC). IEEE, 19–26.
- Rationalization for Explainable NLP: A Survey. arXiv preprint arXiv:2301.08912 (2023).
- A richly annotated corpus for different tasks in automated fact-checking. arXiv preprint arXiv:1911.01214 (2019).
- MOCHA: A Multi-Task Training Approach for Coherent Text Generation from Cognitive Perspective. arXiv preprint arXiv:2210.14650 (2022).
- PLANET: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2288–2305.
- COVID-19–related infodemic and its impact on public health: A global social media analysis. The American journal of tropical medicine and hygiene 103, 4 (2020), 1621.
- Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422–446.
- NewsBag: a multi-modal benchmark dataset for fake news detection. In CEUR Workshop Proc., Vol. 2560. 138–145.
- Multimodal Political Deception Detection. IEEE MultiMedia 28, 1 (2020), 94–102.
- Extractive and Abstractive Explanations for Fact-Checking and Evaluation of News. arXiv preprint arXiv:2104.12918 (2021).
- MVAE: Multimodal Variational Autoencoder for Fake News Detection. The World Wide Web Conference (2019).
- Boilerplate detection using shallow text features. In Proceedings of the third ACM international conference on Web search and data mining. 441–450.
- Neema Kotonya and Francesca Toni. 2020. Explainable automated fact-checking for public health claims. arXiv preprint arXiv:2010.09926 (2020).
- Thank you BART! Rewarding Pre-Trained Models Improves Formality Style Transfer. ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference 2 (may 2021), 484–494. https://doi.org/10.48550/arxiv.2105.06947 arXiv:2105.06947
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. arXiv:1910.13461 [cs.CL]
- A multi-modal method for satire detection using textual and visual cues. arXiv preprint arXiv:2010.06671 (2020).
- Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81. https://aclanthology.org/W04-1013
- Cross-Platform Multimodal Misinformation: Taxonomy, Characteristics and Detection for Textual Posts and Videos. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 16. 651–662.
- Factify: A multi-modal fact verification dataset. In Proceedings of the First Workshop on Multimodal Fact-Checking and Hate Speech Detection (DE-FACTIFY).
- Factify 2: A multimodal fake news and satire news dataset. In proceedings of defactify 2: second workshop on Multimodal Fact-Checking and Hate Speech Detection, CEUR.
- r/fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection. arXiv preprint arXiv:1911.03854 (2019).
- Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. arXiv:1808.08745 [cs.CL]
- Dan Saattrup Nielsen and Ryan McConville. 2022. MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset. arXiv preprint arXiv:2202.11684 (2022).
- OpenAI. 2022. OpenAI: Introducing ChatGPT. https://openai.com/blog/chatgpt
- A corpus of debunked and verified user-generated videos. Online Information Review 43 (11 2018). https://doi.org/10.1108/OIR-03-2018-0101
- A corpus of debunked and verified user-generated videos. Online information review 43, 1 (2018), 72–88.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.
- Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020 [cs.CV]
- Chahat Raj and Priyanka Meel. 2022. ARCNN framework for multimodal infodemic detection. Neural Networks 146 (2022), 36–68. https://doi.org/10.1016/j.neunet.2021.11.006
- A large-scale TV video and metadata database for French political content analysis and fact-checking. (2022).
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084
- Nils Reimers and Iryna Gurevych. 2021. The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, Online, 605–611. https://arxiv.org/abs/2012.14210
- A dataset of fact-checked images shared on whatsapp during the brazilian and indian elections. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 903–908.
- Arjun Roy and Asif Ekbal. 2021. MulCoB-MulFaV: Multimodal Content Based Multilingual Fact Verification. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
- Gautam Kishore Shahi and Durgesh Nandini. 2020. FakeCovid–A multilingual cross-domain fact check news dataset for COVID-19. arXiv preprint arXiv:2006.11343 (2020).
- Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big data 8, 3 (2020), 171–188.
- Spotfake+: A multimodal framework for fake news detection via transfer learning (student abstract). In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 13915–13916.
- A multimodal fake news detection model based on crossmodal attention residual and multichannel convolutional neural networks. Information Processing and Management 58, 1 (2021), 102437. https://doi.org/10.1016/j.ipm.2020.102437
- A multimodal fake news detection model based on crossmodal attention residual and multichannel convolutional neural networks. Information Processing & Management 58, 1 (2021), 102437. https://doi.org/10.1016/j.ipm.2020.102437
- Dominik Stammbach and Elliott Ash. 2020. e-fever: Explanations and summaries for automated fact checking. Proceedings of the 2020 Truth and Trust Online (TTO 2020) (2020), 32–43.
- Detecting cross-modal inconsistency to defend against neural fake news. arXiv preprint arXiv:2009.07698 (2020).
- Fever: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355 (2018).
- Joseph E. Uscinski and Ryden W. Butler. 2013a. The Epistemology of Fact Checking. Critical Review 25, 2 (June 2013), 162–180. https://doi.org/10.1080/08913811.2013.843872
- Joseph E. Uscinski and Ryden W. Butler. 2013b. The Epistemology of Fact Checking. Critical Review 25, 2 (2013), 162–180. https://doi.org/10.1080/08913811.2013.843872 arXiv:https://doi.org/10.1080/08913811.2013.843872
- Representation learning with contrastive predictive coding. arXiv e-prints (2018), arXiv–1807.
- FMFN: Fine-Grained Multimodal Fusion Networks for Fake News Detection. Applied Sciences 12, 3 (2022). https://doi.org/10.3390/app12031093
- Shuohang Wang and Jing Jiang. 2016. A compare-aggregate model for matching text sequences. arXiv preprint arXiv:1611.01747 (2016).
- William Yang Wang. 2017. ” liar, liar pants on fire”: A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648 (2017).
- Multimodal Emergent Fake News Detection via Meta Neural Process Networks. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (aug 2021), 3708–3716. https://doi.org/10.1145/3447548.3467153 arXiv:2106.13711
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).
- Explainable Fact-Checking Through Question Answering. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8952–8956.
- Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019).
- Explain and Predict, and then Predict again. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 418–426.
- [… formula…]: Similarity-Aware Multi-modal Fake News Detection. Advances in Knowledge Discovery and Data Mining 12085 (2020), 354.
- Exploring ai ethics of chatgpt: A diagnostic analysis. arXiv preprint arXiv:2301.12867 (2023).
- Fact-checking meets fauxtography: Verifying claims about images. arXiv preprint arXiv:1908.11722 (2019).
- Barry Menglong Yao (5 papers)
- Aditya Shah (9 papers)
- Lichao Sun (186 papers)
- Jin-Hee Cho (43 papers)
- Lifu Huang (92 papers)