Papers
Topics
Authors
Recent
2000 character limit reached

Low-Resource Fact-Checking: Methods & Challenges

Updated 3 February 2026
  • Low-resource fact-checking is the process of verifying claims in settings with limited data and computational resources using multilingual datasets and efficient model tuning.
  • The methodologies employ synthetic data generation, cross-lingual transfer, modular architectures, and crowdsourced annotation to improve evidence retrieval and label accuracy.
  • Practical challenges like domain shifts, evidence scarcity, and model overconfidence are addressed through innovative calibration techniques and human-in-the-loop oversight strategies.

Low-resource fact-checking refers to the development and deployment of automated or semi-automated fact verification systems in contexts where annotated data, computational resources, or language technology infrastructure are severely limited. This includes the vast majority of languages and regions outside English and a few other high-resource environments. The central objective is to ensure scalable, accurate, and equitable verification of factual claims in settings with minimal labeled corpora, model pretraining, and infrastructure.

1. Core Challenges in Low-Resource Fact-Checking

Low-resource fact-checking is hampered simultaneously by the scarcity of annotated data and limited computational or language modeling capacity. Key difficulties include:

  • Annotated Data Scarcity: Manual annotation for fact-checking (e.g., assigning SUPPORTS, REFUTES, NOT ENOUGH INFORMATION) is prohibitively expensive and logistically taxing, especially where native expert annotators are rare (Chung et al., 21 Feb 2025).
  • Linguistic Coverage and Domain Shift: Most existing resources, models, and benchmarks (FEVER, VitaminC, etc.) are English-centric; naive translation to low-resource languages fails to account for linguistic and cultural context, introducing domain mismatch, translation noise, and degraded model performance (Chung et al., 21 Feb 2025, Cekinel et al., 2024).
  • Evidence Retrieval Complexity: Claims in low-resource languages often reference local knowledge or use linguistic structures with little lexical overlap with available evidence corpora. Retrieval and verification of such claims require systems robust to paraphrase, cross-domain variation, and high “novelty” in n-grams or dependencies (Le et al., 2024, Hoa et al., 2024).
  • Model Confidence and Calibration: Small, computationally cheap models typical for low-resource contexts display overconfidence and low accuracy, which risks amplifying misinformation—a phenomenon documented as the “Dunning–Kruger” confidence paradox (Qazi et al., 10 Sep 2025).
  • Cross-lingual and Multimodal Complexity: Evidence to support or refute claims may be available only in other languages (cross-lingual retrieval) or other modalities (images, tables), demanding sophisticated retrieval and fusion strategies (Huang et al., 2022, Singhal et al., 2021).

2. Dataset Design and Construction

Sophisticated dataset design is fundamental to advance fact-checking in low-resource settings. Notable strategies include:

  • Domain Diversification and Structured Annotation: ViFactCheck for Vietnamese crawled nine government-licensed news websites across 12 domains, ensuring claims are sampled across a broad topical range (Hoa et al., 2024). Annotation procedures require pilot phases, rigorous guidelines, and explicit multi-evidence labeling. The resultant dataset (7,232 claim–evidence pairs) exhibited high reliability (Fleiss’ κ = 0.83).
  • Synthetic Multilingual Data Generation: MultiSynFact introduced a scalable LLM-driven pipeline that extracts knowledge sentences from Wikipedia, prompts LLMs to generate three claims per sentence (SUPPORTS, REFUTES, NOT-INFO), and combines LLM self-evaluation with MNLI filtering for quality control, producing 2.2M multilingual claim-source pairs (Chung et al., 21 Feb 2025).
  • Evidence Diversity and Novelty: ViWikiFC constructed a 20K+ claim–evidence corpus from Vietnamese Wikipedia, explicitly measuring new-word, new-dependency, and new n-gram rates between claims and evidence, revealing retrieval difficulties for NOT ENOUGH INFORMATION claims (NEI new word rate 50.44%) (Le et al., 2024).
  • Crowd-Driven and Distant Supervision: Systems such as CrowdChecked automatically mined hundreds of thousands of tweet–fact-check pairs by matching links shared in social media, with noisy labels refined by self-adaptive training and weak supervision protocols (Hardalov et al., 2022).
Dataset Language(s) Claim-Evidence Pairs Domains IAA (κ or equivalent)
ViFactCheck Vietnamese 7,232 12 news topics 0.83 (Fleiss’ κ)
ViWikiFC Vietnamese 20,916 Wikipedia 0.9587 (Fleiss’ κ)
MultiSynFact en, es, de (+ext) 2.2M Wikipedia LLM+NLI/spot-checks
CrowdChecked English (+tweets) 332,660 Social media -
FactDRIL 13 Indian langs 22,435 Multi-domain 0.76–1.00
FCTR Turkish 3,238 Multi-domain -

3. Model Architectures and Transfer Learning

Model selection and adaptation in low-resource settings leverage a mixture of pre-trained LLMs (PLMs), LLMs, and parameter-efficient fine-tuning:

  • Multilingual Pretrained Models: Systems such as ViFactCheck and ViWikiFC employ PhoBERT, XLM-R, and InfoXLM, leveraging pretrained multilingual representations and fine-tuning on language-specific supervisions (Hoa et al., 2024, Le et al., 2024).
  • LLMs and LoRA Fine-Tuning: Large open-source models (Llama2/3, Mistral-7B, Gemma-7B) are fine-tuned with LoRA adapters (r=16, α=16) over 5 epochs, enabling competitive macro-F1 (e.g., Gemma-7B: 89.90% macro F1 for Vietnamese) on modest hardware (Hoa et al., 2024). QLoRA parameter-efficient tuning is similarly applied to Llama-2 for Turkish (Cekinel et al., 2024).
  • Prompt Engineering vs. Fine-Tuning: Zero-shot and few-shot prompting of LLMs underperforms task-specific finetuning—e.g., Gemma-7B in zero-shot scored 39.47% F1 vs. 89.90% with finetuning (Hoa et al., 2024). In Turkish, Llama-2-13B fine-tuned on 500 local examples yielded F1-macro=0.890 (Cekinel et al., 2024).
  • Cross-Lingual Transfer and Self-Supervised Objectives: CONCRETE introduces a cross-lingual bi-encoder trained on the Inverse Cloze Task (X-ICT), optimizing dot-product similarity of claim and passage embeddings across languages and showing +2.23 pp macro-F1 improvement over previous systems in zero-shot settings (Huang et al., 2022).
  • Modular and Plug-and-Play Frameworks: Self-Checker assembles LLM-driven modules (claim decomposition, query generation, evidence selection, verdict prediction) as a fully prompt-driven pipeline, demanding no in-domain training but with substantial trade-offs in accuracy and latency (2305.14623).

4. Retrieval, Claim Matching, and Evidence Aggregation

Claim verification is often bottle-necked by the retrieval of relevant evidence, especially when lexical overlap is low or evidence is available cross-lingually:

  • Sparse vs. Dense Retrieval: BM25 achieves high accuracy for SUPPORTS and REFUTES (88.30%/86.93%) in Vietnamese Wikipedia, but only 56.67% for NEI due to low overlap. Hybrid pipelines (BM25 + SBERT, dense dual-encoders) are recommended for improved performance (Le et al., 2024).
  • Claim Matching in Messaging Platforms: Cross-lingual claim-matching models (student XLM-R distilled from English SBERT) outperform LASER and LaBSE on WhatsApp data in Bengali, Malayalam, Tamil (MRR=0.528 Bengali for I-XLM-R) (Kazemi et al., 2021).
  • Noisy Distant Supervision: In CrowdChecked, bi-encoder SBERT models are trained with modified Multiple Negatives Ranking (MNR) loss and self-adaptive label weighting to address large-scale noisy tweet–fact-check matches, with MAP@5 gains >11 points over NLytics (Hardalov et al., 2022).
  • NER-Based Query Expansion: WikiCheck demonstrates that entity extraction from claims and issuing separate Wikipedia queries per entity increases average recall from 0.628 (raw query) to 0.879 (NER-flair-fast, N=3), critical for supporting retrieval in CPU or low-memory deployments (Trokhymovych et al., 2021).

5. Evaluation Protocols and Error Analysis

Systematic assessment of low-resource fact-checking systems employs macro-F1, precision, recall, retrieval accuracy (Accuracy@k), and stricter pipeline metrics (e.g., strict accuracy requiring correct evidence and label):

Model/System Language Macro F1 / Strict Acc Notable Baseline/Method
Gemma-7B (LoRA) Vietnamese 89.90% (Gold Evidence) Fine-tuned, Gold Evidence, ViFactCheck (Hoa et al., 2024)
InfoXLM (Large) Vietnamese 86.51% (VP task) ViWikiFC, monolingual (Le et al., 2024)
mDeBERTa-v3-base MultiLang Macro-F1 up to 0.203 ↑ MultiSynFact augmentation (Chung et al., 21 Feb 2025)
CONCRETE (mBERT + X-ICT) X-Fact Macro-F1 +2.2pp in zero-shot Cross-lingual claim-style retrieval (Huang et al., 2022)

Dominant error modes include evidence retrieval failure, semantic ambiguity, multi-step inferential chains, and hallucinated inference despite correct evidence (Hoa et al., 2024). For NEI claims, high novelty at the lexical and syntactic level severely degrades both retrieval and label prediction (Le et al., 2024).

6. Human-in-the-Loop, Crowdsourcing, and Policy Considerations

Human input, crowdsourcing, and governance are crucial in low-resource and high-risk settings:

  • Crowdsourcing Fact-Verification: Twitter Birdwatch demonstrates that a volunteer-driven note and rating mechanism achieves 83.2% agreement with experts, with verification latency an order of magnitude faster and nearly zero direct cost (Saeed et al., 2022).
  • Hybrid Human/Model Oversight: Overconfident small LLMs (e.g., Llama-7B, Mistral-7B) risk error amplification; policies must ensure human review of all verdicts below critical confidence thresholds, and transparent disclosure of known model limitations and biases (Qazi et al., 10 Sep 2025).
  • Scalability via Plug-and-Play and Modular Design: Frameworks such as Self-Checker enable rapid adaptation in new languages or domains without large-scale annotation but can only supplement, not yet replace, more deeply fine-tuned models (2305.14623).

7. Future Directions and Best Practices

Leading efforts identify several priorities for further advances in low-resource fact-checking:

In summary, low-resource fact-checking research establishes a multi-pronged strategy: high-quality, domain-diverse data annotation; scalable synthetic generation; cross-lingual and parameter-efficient model adaptation; robust retrieval pipelines; and hybrid human–AI oversight—all as conditions for accurate and equitable information verification in global low-resource settings (Hoa et al., 2024, Chung et al., 21 Feb 2025, Le et al., 2024, Huang et al., 2022, Qazi et al., 10 Sep 2025, Saeed et al., 2022).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Low-Resource Fact-Checking.