Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unsupervised Pretraining for Fact Verification by Language Model Distillation (2309.16540v3)

Published 28 Sep 2023 in cs.CL, cs.LG, and stat.ML

Abstract: Fact verification aims to verify a claim using evidence from a trustworthy knowledge base. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via LLM Distillation), a novel unsupervised pretraining framework that leverages pre-trained LLMs to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on FB15k-237 (+5.3% Hits@1) and FEVER (+8% accuracy) with linear evaluation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Multifc: A real-world multi-domain dataset for evidence-based fact checking of claims. CoRR, abs/1909.03242, 2019. URL http://arxiv.org/abs/1909.03242.
  2. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  3. Relational graph attention networks. 2019. URL http://arxiv.org/abs/1904.05811.
  4. Unsupervised learning of visual features by contrasting cluster assignments. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  9912–9924. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/70feb62b69f16e0238f741fab228fec2-Paper.pdf.
  5. Knowledge is flat: A Seq2Seq generative framework for various knowledge graph completion. In Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon Na (eds.), Proceedings of the 29th International Conference on Computational Linguistics, pp.  4005–4017, Gyeongju, Republic of Korea, October 2022a. International Committee on Computational Linguistics. URL https://aclanthology.org/2022.coling-1.352.
  6. Hhgn: A hierarchical reasoning-based heterogeneous graph neural network for fact verification. Inf. Process. Manage., 58(5), sep 2021a. ISSN 0306-4573. doi: 10.1016/j.ipm.2021.102659. URL https://doi.org/10.1016/j.ipm.2021.102659.
  7. An entity-graph based reasoning method for fact verification. Information Processing & Management, 58(3):102472, 2021b. ISSN 0306-4573. doi: https://doi.org/10.1016/j.ipm.2020.102472. URL https://www.sciencedirect.com/science/article/pii/S0306457320309614.
  8. Knowledge-enhanced graph attention network for fact verification. Mathematics, 9(16), 2021c. ISSN 2227-7390. doi: 10.3390/math9161949. URL https://www.mdpi.com/2227-7390/9/16/1949.
  9. Cosg: A graph-based contrastive learning method for fact verification. Sensors, 21(10), 2021d. ISSN 1424-8220. doi: 10.3390/s21103471. URL https://www.mdpi.com/1424-8220/21/10/3471.
  10. Gere: Generative evidence retrieval for fact verification. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, pp.  2184–2189, New York, NY, USA, 2022b. Association for Computing Machinery. ISBN 9781450387323. doi: 10.1145/3477495.3531827. URL https://doi.org/10.1145/3477495.3531827.
  11. A simple framework for contrastive learning of visual representations, 2020a.
  12. Big self-supervised models are strong semi-supervised learners. arXiv preprint arXiv:2006.10029, 2020b.
  13. Ranking measures and loss functions in learning to rank. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta (eds.), Advances in Neural Information Processing Systems, volume 22. Curran Associates, Inc., 2009. URL https://proceedings.neurips.cc/paper_files/paper/2009/file/2f55707d4193dc27118a0f19a1985716-Paper.pdf.
  14. Unsupervised image classification for deep representation learning, 2020c.
  15. Improved baselines with momentum contrastive learning, 2020d.
  16. Bidirectional attentive memory networks for question answering over knowledge bases. arXiv preprint arXiv:1903.02188, 2019.
  17. Transformer-xl: Attentive language models beyond a fixed-length context, 2019.
  18. The state of human-centered NLP technology for fact-checking. Information Processing & Management, 60(2):103219, 2023. ISSN 0306-4573. doi: https://doi.org/10.1016/j.ipm.2022.103219. URL https://www.sciencedirect.com/science/article/pii/S030645732200320X.
  19. Bert: Pre-training of deep bidirectional transformers for language understanding. In North American Chapter of the Association for Computational Linguistics, 2019. URL https://api.semanticscholar.org/CorpusID:52967399.
  20. Paragraph-based transformer pre-training for multi-sentence inference. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  2521–2531, Seattle, United States, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.181. URL https://aclanthology.org/2022.naacl-main.181.
  21. Seed: Self-supervised distillation for visual representation, 2021.
  22. Analyzing and improving representations with the soft nearest neighbor loss, 2019.
  23. Unsupervised semantic segmentation by contrasting object mask proposals, 2021.
  24. A Survey on Automated Fact-Checking. Transactions of the Association for Computational Linguistics, 10:178–206, 02 2022. ISSN 2307-387X. doi: 10.1162/tacl˙a˙00454. URL https://doi.org/10.1162/tacl_a_00454.
  25. Mocosa: Momentum contrast for knowledge graph completion with structure-augmented pre-trained language models, 2023a.
  26. Momentum contrast for unsupervised visual representation learning, 2020.
  27. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing, 2023b.
  28. Distilling the knowledge in a neural network, 2015.
  29. A survey of safety and trustworthiness of large language models through the lens of verification and validation, 2023.
  30. Mayank Jobanputra. Unsupervised question answering for fact-checking. In Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER), pp.  52–56, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-6609. URL https://aclanthology.org/D19-6609.
  31. Generating fluent fact checking explanations with unsupervised post-editing. Information, 13(10), 2022. ISSN 2078-2489. doi: 10.3390/info13100500. URL https://www.mdpi.com/2078-2489/13/10/500.
  32. Unsupervised fact checking by counter-weighted positive and negative evidential paths in a knowledge graph. In Proceedings of the 28th International Conference on Computational Linguistics, pp.  1677–1686, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. doi: 10.18653/v1/2020.coling-main.147. URL https://aclanthology.org/2020.coling-main.147.
  33. Proofver: Natural logic theorem proving for fact verification. CoRR, abs/2108.11357, 2021. URL https://arxiv.org/abs/2108.11357.
  34. Language models as fact checkers? In Proceedings of the Third Workshop on Fact Extraction and VERification (FEVER), pp.  36–41, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.fever-1.5. URL https://aclanthology.org/2020.fever-1.5.
  35. Trustworthy llms: a survey and guideline for evaluating large language models’ alignment, 2023.
  36. Roberta: A robustly optimized bert pretraining approach, 2019.
  37. Fine-grained fact verification with kernel graph attention network, 2021.
  38. Kelm: Knowledge enhanced pre-trained language representations with message passing on hierarchical relational graphs, 2022.
  39. Qed: A fact verification system for the fever shared task. 2018. URL https://api.semanticscholar.org/CorpusID:53649146.
  40. Self-supervised distilled learning for multi-modal misinformation identification. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp.  2818–2827, 2023. doi: 10.1109/WACV56688.2023.00284.
  41. Relational knowledge distillation, 2019.
  42. Improving language understanding by generative pre-training. 2018.
  43. Language models are unsupervised multitask learners. 2019.
  44. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1), jan 2020a. ISSN 1532-4435.
  45. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21(140):1–67, 2020b. URL http://jmlr.org/papers/v21/20-074.html.
  46. Fitnets: Hints for thin deep nets, 2015.
  47. Lemon: Language model for negative sampling of knowledge graph embeddings. 2022. URL https://api.semanticscholar.org/CorpusID:252198823.
  48. Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  4498–4507, 2020.
  49. Topic-aware evidence reasoning and stance-aware aggregation for fact verification, 2021.
  50. FEVER: a large-scale dataset for fact extraction and verification. CoRR, abs/1803.05355, 2018. URL http://arxiv.org/abs/1803.05355.
  51. Contrastive representation distillation, 2022.
  52. Representing text for joint embedding of text and knowledge bases. In Lluís Màrquez, Chris Callison-Burch, and Jian Su (eds.), Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.  1499–1509, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi: 10.18653/v1/D15-1174. URL https://aclanthology.org/D15-1174.
  53. Graph Attention Networks. International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJXMpikCZ.
  54. Scientific fact-checking: A survey of resources and approaches. In Annual Meeting of the Association for Computational Linguistics, 2023.
  55. Structure-augmented text representation learning for efficient knowledge graph completion. In Proceedings of the Web Conference 2021. ACM, apr 2021a. doi: 10.1145/3442381.3450043. URL https://doi.org/10.1145%2F3442381.3450043.
  56. Simkgc: Simple contrastive knowledge graph completion with pre-trained language models, 2022.
  57. Kepler: A unified model for knowledge embedding and pre-trained language representation. Transactions of the Association for Computational Linguistics, 9:176–194, 2021b.
  58. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp.  38–45, Online, October 2020. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.emnlp-demos.6.
  59. Progcl: Rethinking hard negative mining in graph contrastive learning. In International conference on machine learning. PMLR, 2022.
  60. LaPraDoR: Unsupervised pretrained dense retriever for zero-shot text retrieval. In ACL 2022 (Findings), 2022.
  61. Knowledge distillation meets self-supervision, 2020.
  62. Knowledge graph contrastive learning for recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, pp.  1434–1443, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450387323. doi: 10.1145/3477495.3532009. URL https://doi.org/10.1145/3477495.3532009.
  63. Xlnet: Generalized autoregressive pretraining for language understanding, 2020.
  64. Kg-bert: Bert for knowledge graph completion, 2019.
  65. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  66. Coreferential reasoning learning for language representation, 2020.
  67. Twowingos: A two-wing optimization strategy for evidential claim verification, 2018.
  68. Generate rather than retrieve: Large language models are strong context generators. In International Conference for Learning Representation (ICLR), 2023.
  69. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer, 2017.
  70. Prompt to be consistent is better than self-consistent? few-shot and zero-shot fact verification with pre-trained language models. In Annual Meeting of the Association for Computational Linguistics, 2023. URL https://api.semanticscholar.org/CorpusID:259076388.
  71. Reasoning over semantic-level graph for fact checking, 2020.
  72. Gear: Graph-based evidence aggregating and reasoning for fact verification, 2019.
  73. Knowledge enhanced fact checking and verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3132–3143, 2021. doi: 10.1109/TASLP.2021.3120636.
  74. Detection and resolution of rumours in social media: A survey. ACM Comput. Surv., 51(2), feb 2018. ISSN 0360-0300. doi: 10.1145/3161603. URL https://doi.org/10.1145/3161603.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Adrián Bazaga (10 papers)
  2. Pietro Liò (270 papers)
  3. Gos Micklem (7 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets