Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Analysis of Multilingual FActScore (2406.19415v1)

Published 20 Jun 2024 in cs.CL

Abstract: FActScore has gained popularity as a metric to estimate the factuality of long-form texts generated by LLMs in English. However, there has not been any work in studying the behavior of FActScore in other languages. This paper studies the limitations of each component in the four-component pipeline of FActScore in the multilingual setting. We introduce a new dataset for FActScore on texts generated by strong multilingual LLMs. Our evaluation shows that LLMs exhibit distinct behaviors in both fact extraction and fact scoring tasks. No LLM produces consistent and reliable FActScore across languages with varying levels of resources. We also find that the knowledge source plays an important role in the quality of the estimated FActScore. Using Wikipedia as the knowledge source may hinder the true FActScore of long-form text due to its limited coverage in medium- and low-resource languages. We also incorporate three mitigations to our knowledge source that ultimately improve FActScore estimation across all languages.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Kim Trong Vu (1 paper)
  2. Michael Krumdick (10 papers)
  3. Varshini Reddy (12 papers)
  4. Franck Dernoncourt (161 papers)
  5. Viet Dac Lai (25 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets