Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Do We Need Language-Specific Fact-Checking Models? The Case of Chinese (2401.15498v3)

Published 27 Jan 2024 in cs.CL

Abstract: This paper investigates the potential benefits of language-specific fact-checking models, focusing on the case of Chinese. We first demonstrate the limitations of translation-based methods and multilingual LLMs (e.g., GPT-4), highlighting the need for language-specific systems. We further propose a Chinese fact-checking system that can better retrieve evidence from a document by incorporating context information. To better analyze token-level biases in different systems, we construct an adversarial dataset based on the CHEF dataset, where each instance has large word overlap with the original one but holds the opposite veracity label. Experimental results on the CHEF dataset and our adversarial dataset show that our proposed method outperforms translation-based methods and multilingual LLMs and is more robust toward biases, while there is still large room for improvement, emphasizing the importance of language-specific fact-checking systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Caiqi Zhang (15 papers)
  2. Zhijiang Guo (55 papers)
  3. Andreas Vlachos (70 papers)
Citations (6)
X Twitter Logo Streamline Icon: https://streamlinehq.com