Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

QuestGen: Effectiveness of Question Generation Methods for Fact-Checking Applications (2407.21441v2)

Published 31 Jul 2024 in cs.CL

Abstract: Verifying fact-checking claims poses a significant challenge, even for humans. Recent approaches have demonstrated that decomposing claims into relevant questions to gather evidence enhances the efficiency of the fact-checking process. In this paper, we provide empirical evidence showing that this question decomposition can be effectively automated. We demonstrate that smaller generative models, fine-tuned for the question generation task using data augmentation from various datasets, outperform LLMs by up to 8%. Surprisingly, in some cases, the evidence retrieved using machine-generated questions proves to be significantly more effective for fact-checking than that obtained from human-written questions. We also perform manual evaluation of the decomposed questions to assess the quality of the questions generated.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP ’19). 4685–4697.
  2. BRENDA: Browser Extension for Fake News Detection. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20). 2117–2120.
  3. Canyu Chen and Kai Shu. 2023. Combating misinformation in the age of llms: Opportunities and challenges. arXiv preprint arXiv:2311.05656 (2023).
  4. Generating Literal and Implied Subquestions to Fact-check Complex Claims. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP ’22). 3495–3516.
  5. FacTool: Factuality Detection in Generative AI – A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios. arXiv:2307.13528 [cs]
  6. Generating Fact Checking Briefs. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (EMNLP ’20’). 7147–7161.
  7. RARR: Researching and Revising What Language Models Say, Using Language Models. arXiv:2210.08726 [cs]
  8. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812.
  9. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751 (2019).
  10. Bad actor, good advisor: Exploring the role of large language models in fake news detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 22105–22113.
  11. ProoFVer: Natural Logic Theorem Proving for Fact Verification. Transactions of the Association for Computational Linguistics 10 (2022), 1013–1030.
  12. Natural Questions: A Benchmark for Question Answering Research. Transactions of the Association for Computational Linguistics 7 (2019), 452–466.
  13. Varifocal Question Generation for Fact-checking. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2532–2544.
  14. Training language models to follow instructions with human feedback. Advances in neural information processing systems 35 (2022), 27730–27744.
  15. Fact-Checking Complex Claims with Program-Guided Reasoning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL ’23). 6981–7004.
  16. FaVIQ: FAct Verification from Information-seeking Questions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL ’22). 5154–5166.
  17. DeClarE: Debunking fake news and false claims using evidence-aware deep learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP ’18). 22–32.
  18. AVERITEC: a dataset for real-world claim verification with evidence from the web. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NeurIPS ’23’). 65128–65167.
  19. Vinay Setty. 2024a. FactCheck Editor: Multilingual Text Editor with End-to-End fact-checking. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (Washington DC, USA) (SIGIR ’24). 2744–2748.
  20. Vinay Setty. 2024b. Surprising Efficacy of Fine-Tuned Transformers for Fact-Checking over Larger Language Models. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). 2842–2846.
  21. QuanTemp: A real-world open-domain benchmark for fact-checking numerical claims. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). 650–660.
  22. Explainable fact-checking through question answering. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8952–8956.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Vinay Setty (22 papers)
  2. Ritvik Setty (1 paper)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets