ClaimVer: Explainable Claim-Level Verification and Evidence Attribution of Text Through Knowledge Graphs (2403.09724v4)
Abstract: In the midst of widespread misinformation and disinformation through social media and the proliferation of AI-generated texts, it has become increasingly difficult for people to validate and trust information they encounter. Many fact-checking approaches and tools have been developed, but they often lack appropriate explainability or granularity to be useful in various contexts. A text validation method that is easy to use, accessible, and can perform fine-grained evidence attribution has become crucial. More importantly, building user trust in such a method requires presenting the rationale behind each prediction, as research shows this significantly influences people's belief in automated systems. Localizing and bringing users' attention to the specific problematic content is also paramount, instead of providing simple blanket labels. In this paper, we present ClaimVer, a human-centric framework tailored to meet users' informational and verification needs by generating rich annotations and thereby reducing cognitive load. Designed to deliver comprehensive evaluations of texts, it highlights each claim, verifies it against a trusted knowledge graph (KG), presents the evidence, and provides succinct, clear explanations for each claim prediction. Finally, our framework introduces an attribution score, enhancing applicability across a wide range of downstream tasks.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Feverous: Fact extraction and verification over unstructured and structured information. arXiv preprint arXiv:2106.05707, 2021.
- Attributed question answering: Evaluation and modeling for attributed large language models. arXiv preprint arXiv:2212.08037, 2022.
- Tabfact: A large-scale dataset for table-based fact verification. arXiv preprint arXiv:1909.02164, 2019.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
- Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364, 2017.
- Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems, 36, 2024.
- Structural encoding and pre-training matter: Adapting bert for table-based fact verification. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2366–2375, 2021.
- Rarr: Researching and revising what language models say, using language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16477–16508, 2023.
- Emanuel Gerber. spacy module for linking text to wikidata items. https://github.com/egerber/spaCy-entity-linker, 2023. Accessed: 2024-02-26.
- Knowledge distillation: A survey. International Journal of Computer Vision, 129:1789–1819, 2021.
- Language models hallucinate, but may excel at fact verification. arXiv preprint arXiv:2310.14564, 2023.
- Sistema visual para explorar subgrafos temáticos en wikidata. 2023.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Comet-atomic 2020: On symbolic and neural commonsense knowledge graphs. In AAAI Conference on Artificial Intelligence, 2020.
- Phi-2: The surprising power of small language models. Microsoft Research Blog, 2023.
- Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
- Active retrieval augmented generation. arXiv preprint arXiv:2305.06983, 2023.
- Solar 10.7 b: Scaling large language models with simple yet effective depth up-scaling. arXiv preprint arXiv:2312.15166, 2023.
- End-to-end neural coreference resolution. arXiv preprint arXiv:1707.07045, 2017.
- Language models as fact checkers? arXiv preprint arXiv:2006.04102, 2020.
- Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics.
- When not to trust language models: Investigating effectiveness and limitations of parametric and non-parametric memories. arXiv preprint arXiv:2212.10511, 2022.
- Factscore: Fine-grained atomic evaluation of factual precision in long form text generation, 2023.
- Artificial intelligence and increasing misinformation. The British Journal of Psychiatry, 224(2):33–35, 2024.
- Measuring attribution in natural language generation models. Computational Linguistics, pages 1–64, 2023.
- When confidence meets accuracy: Exploring the effects of multiple performance indicators on trust in machine learning models. In Proceedings of the 2022 chi conference on human factors in computing systems, pages 1–14, 2022.
- Changes in diagnosis rates and behavioural traits of autism spectrum disorder over time. BJPsych open, 1(2):110–115, 2015.
- Exploiting cloze questions for few shot text classification and natural language inference. arXiv preprint arXiv:2001.07676, 2020.
- Simple entity-centric questions challenge dense retrievers. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6138–6148, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
- Donghee Shin. The effects of explainability and causability on perception, trust, and acceptance: Implications for explainable ai. International Journal of Human-Computer Studies, 146:102551, 2021.
- Fever: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355, 2018.
- The FEVER2.0 shared task. In James Thorne, Andreas Vlachos, Oana Cocarascu, Christos Christodoulopoulos, and Arpit Mittal, editors, Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER), pages 1–6, Hong Kong, China, November 2019. Association for Computational Linguistics.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Zephyr: Direct distillation of lm alignment. arXiv preprint arXiv:2310.16944, 2023.
- Wikidata: a free collaborative knowledgebase. Communications of the ACM, 57(10):78–85, 2014.
- Semeval-2021 task 9: Fact verification and evidence finding for tabular data in scientific documents (sem-tab-facts). arXiv preprint arXiv:2105.13995, 2021.
- ” do you trust me?” increasing user-trust by integrating virtual agents in explainable ai interaction design. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents, pages 7–9, 2019.
- WikiQA: A challenge dataset for open-domain question answering. In Lluís Màrquez, Chris Callison-Burch, and Jian Su, editors, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2013–2018, Lisbon, Portugal, September 2015. Association for Computational Linguistics.
- Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600, 2018.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
- Automatic evaluation of attribution by large language models. arXiv preprint arXiv:2305.06311, 2023.
- Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206, 2023.
- Preetam Prabhu Srikar Dammu (6 papers)
- Himanshu Naidu (2 papers)
- Mouly Dewan (3 papers)
- Tanya Roosta (10 papers)
- Aman Chadha (110 papers)
- Chirag Shah (41 papers)
- Youngmin Kim (24 papers)