Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish (2403.00411v2)

Published 1 Mar 2024 in cs.CL

Abstract: The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish. We demonstrate in-context learning (zero-shot and few-shot) performance of LLMs in this context. The experimental results indicate that the dataset has the potential to advance research in the Turkish language.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (98)
  1. Open-domain, content-based, multi-modal fact-checking of out-of-context images via online resources. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14940–14949.
  2. Rulehub: A public corpus of rules for knowledge graphs. Journal of Data and Information Quality (JDIQ), 12(4):1–22.
  3. Hunt Allcott and Matthew Gentzkow. 2017. Social media and fake news in the 2016 election. Journal of economic perspectives, 31(2):211–36.
  4. The fact extraction and VERification over unstructured and structured information (FEVEROUS) shared task. In Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER), pages 1–13, Dominican Republic. Association for Computational Linguistics.
  5. Generating fact checking explanations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7352–7364.
  6. Generating label cohesive and well-formed adversarial claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3168–3177.
  7. Multifc: A real-world multi-domain dataset for evidence-based fact checking of claims. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4685–4697.
  8. Overview of the clef-2018 checkthat! lab on automatic identification and verification of political claims. task 2: Factuality. CLEF (Working Notes), 2125.
  9. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022.
  10. A neural model to jointly predict and explain truthfulness of statements. ACM Journal of Data and Information Quality, 15(1):1–19.
  11. Recep Firat Cekinel and Pinar Karagoz. 2024. Explaining veracity predictions with evidence summarization: A multi-task model approach. arXiv preprint arXiv:2402.06443.
  12. Causal understanding of fake news dissemination on social media. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 148–157.
  13. Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning, 20:273–297.
  14. Deterrent: Knowledge guided graph attention network for detecting healthcare misinformation. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 492–502.
  15. Ask to know more: Generating counterfactual explanations for fake claims. arXiv preprint arXiv:2206.04869.
  16. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  17. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  18. Cross-lingual covid-19 fake news detection. In 2021 International Conference on Data Mining Workshops (ICDMW), pages 859–862. IEEE.
  19. Hotflip: White-box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 31–36.
  20. Exfakt: A framework for explaining facts over knowledge graphs and text. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pages 87–95.
  21. Xhate-999: Analyzing and detecting abusive language across domains and languages. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6350–6365.
  22. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics, 10:178–206.
  23. Ashim Gupta and Vivek Srikumar. 2021. X-fact: A new benchmark dataset for multilingual fact checking. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 675–682.
  24. Detecting social media manipulation in low-resource languages. In Companion Proceedings of the ACM Web Conference 2023, pages 1358–1364.
  25. A richly annotated corpus for different tasks in automated fact-checking. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 493–503.
  26. Mapping (dis-) information flow about the mh17 plane crash. In Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda, pages 45–55.
  27. Benjamin Horne and Sibel Adali. 2017. This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the international AAAI conference on web and social media, volume 11, pages 759–766.
  28. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  29. Chef: A pilot chinese dataset for evidence-based fact-checking. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3362–3376.
  30. Concrete: Improving cross-lingual fact-checking with cross-lingual retrieval. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1024–1035.
  31. Hover: A dataset for many-hop fact extraction and claim verification. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3441–3460.
  32. Novel visual and statistical image features for microblogs news verification. IEEE transactions on multimedia, 19(3):598–608.
  33. exbake: Automatic fake news detection model based on bidirectional encoder representations from transformers (bert). Applied Sciences, 9(19):4062.
  34. Claim matching beyond english to scale global fact-checking. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4504–4517.
  35. Matching tweets with applicable fact-checks across languages.
  36. WatClaimCheck: A new dataset for claim entailment and inference. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1293–1304, Dublin, Ireland. Association for Computational Linguistics.
  37. FactKG: Fact verification via reasoning on knowledge graphs. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16190–16206, Toronto, Canada. Association for Computational Linguistics.
  38. Neema Kotonya and Francesca Toni. 2020a. Explainable automated fact-checking: A survey. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5430–5443.
  39. Neema Kotonya and Francesca Toni. 2020b. Explainable automated fact-checking for public health claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7740–7754.
  40. What boosts fake news dissemination on social media? a causal inference view. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 234–246. Springer.
  41. Zero-shot rumor detection with propagation structure via prompt learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 5213–5221.
  42. Yi-Ju Lu and Cheng-Te Li. 2020. Gcan: Graph-aware co-attention networks for explainable fake news detection on social media. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 505–514.
  43. Newsclippings: Automatic generation of out-of-context multimodal media. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6801–6817.
  44. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft.
  45. Fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6149–6157.
  46. Dan S Nielsen and Ryan McConville. 2022. Mumin: A large-scale multilingual multimodal fact-checked misinformation social network dataset. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3141–3153.
  47. Jeppe Nørregaard and Leon Derczynski. 2021. Danfever: claim verification dataset for danish. In Proceedings of the 23rd Nordic conference on computational linguistics (NoDaLiDa), pages 422–428.
  48. A survey on natural language processing for fake news detection. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 6086–6093.
  49. Varifocal question generation for fact-checking. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2532–2544, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  50. Content based fake news detection using knowledge graphs. In International semantic web conference, pages 669–683. Springer.
  51. Fact-checking complex claims with program-guided reasoning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6981–7004, Toronto, Canada. Association for Computational Linguistics.
  52. Automatic detection of fake news. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3391–3401.
  53. David Pogue. 2017. How to stamp out fake news. Scientific American, 316(2):24–24.
  54. Where the truth lies: Explaining the credibility of emerging claims on the web and social media. In Proceedings of the 26th International Conference on World Wide Web Companion, pages 1003–1012.
  55. Declare: Debunking fake news and false claims using evidence-aware deep learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 22–32.
  56. Fake news detection in dravidian languages using transfer learning with adaptive finetuning. Engineering Applications of Artificial Intelligence, 126:106877.
  57. Neural machine translation for low-resource languages: A survey. ACM Computing Surveys, 55(11):1–37.
  58. (mis) information dissemination in whatsapp: Gathering, analyzing and countermeasures. In The World Wide Web Conference, pages 818–828.
  59. Evidence-based fact-checking of health-related claims. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3499–3512.
  60. Get your vitamin c! robust fact verification with contrastive evidence. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 624–643.
  61. Towards debiasing fact verification models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3419–3425.
  62. Gautam Kishore Shahi and Durgesh Nandini. 2020a. Fakecovid–a multilingual cross-domain fact check news dataset for covid-19. arXiv preprint arXiv:2006.11343.
  63. Gautam Kishore Shahi and Durgesh Nandini. 2020b. Fakecovid–a multilingual cross-domain fact check news dataset for covid-19. arXiv preprint arXiv:2006.11343.
  64. defend: Explainable fake news detection. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 395–405.
  65. Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big data, 8(3):171–188.
  66. Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1):22–36.
  67. Propagation2vec: Embedding partial propagation networks for explainable fake news early detection. Information Processing & Management, 58(5):102618.
  68. Credibility-based fake news detection. In Disinformation, Misinformation, and Fake News in Social Media, pages 163–182. Springer.
  69. Dominik Stammbach and Elliott Ash. 2020. e-fever: Explanations and summaries for automated fact checking. Proceedings of the 2020 Truth and Trust Online (TTO 2020), pages 32–43.
  70. Cross-lingual zero-and few-shot hate speech detection utilising frozen transformer language models and axel. arXiv preprint arXiv:2004.13850.
  71. Fake news detectors are biased against texts generated by large language models. arXiv preprint arXiv:2309.08674.
  72. Factify 2: A multimodal fake news and satire news dataset. arXiv preprint arXiv:2304.03897.
  73. New explainability method for bert-based model in fake news detection. Scientific Reports, 11(1):1–13.
  74. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  75. Fever: a large-scale dataset for fact extraction and verification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819.
  76. Evaluating adversarial attacks against multiple fact verification systems. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2944–2953.
  77. Jörg Tiedemann and Santhosh Thottingal. 2020. OPUS-MT — Building open translation services for the World. In Proceedings of the 22nd Annual Conferenec of the European Association for Machine Translation (EAMT), Lisbon, Portugal.
  78. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  79. John W Tukey. 1949. Comparing individual means in the analysis of variance. Biometrics, pages 99–114.
  80. Csfever and ctkfacts: acquiring czech data for fact verification. Language Resources and Evaluation, pages 1–35.
  81. Andreas Vlachos and Sebastian Riedel. 2014. Fact checking: Task definition and dataset construction. In Proceedings of the ACL 2014 workshop on language technologies and computational social science, pages 18–22.
  82. Juraj Vladika and Florian Matthes. 2023. Scientific fact-checking: A survey of resources and approaches. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6215–6230, Toronto, Canada. Association for Computational Linguistics.
  83. The spread of true and false news online. science, 359(6380):1146–1151.
  84. Fact or fiction: Verifying scientific claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7534–7550.
  85. Haoran Wang and Kai Shu. 2023. Explainable claim verification via knowledge-grounded reasoning with large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6288–6304.
  86. William Yang Wang. 2017. “liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 422–426.
  87. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  88. Counterfactual debiasing for fact verification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6777–6789, Toronto, Canada. Association for Computational Linguistics.
  89. Explainable fact-checking through question answering. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8952–8956. IEEE.
  90. End-to-end multimodal fact-checking and explanation generation: A challenging dataset and models. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2733–2743.
  91. Detecting fake news for reducing misinformation risks using analytics approaches. European Journal of Operational Research, 279(3):1036–1052.
  92. Instruction tuning for large language models: A survey. arXiv preprint arXiv:2308.10792.
  93. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
  94. Causalrd: A causal view of rumor detection via eliminating popularity and conformity biases. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications, pages 1369–1378. IEEE.
  95. Answerfact: Fact checking in product question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2407–2417.
  96. Fake news early detection: A theory-driven model. Digital Threats: Research and Practice, 1(2):1–25.
  97. Xinyi Zhou and Reza Zafarani. 2019. Network-based fake news detection: A pattern-driven approach. ACM SIGKDD explorations newsletter, 21(2):48–60.
  98. Xinyi Zhou and Reza Zafarani. 2020. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys (CSUR), 53(5):1–40.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Recep Firat Cekinel (4 papers)
  2. Pinar Karagoz (8 papers)
  3. Cagri Coltekin (2 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com