Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM Factoscope: Uncovering LLMs' Factual Discernment through Inner States Analysis (2312.16374v3)

Published 27 Dec 2023 in cs.CL and cs.AI

Abstract: LLMs have revolutionized various domains with extensive knowledge and creative capabilities. However, a critical issue with LLMs is their tendency to produce outputs that diverge from factual reality. This phenomenon is particularly concerning in sensitive applications such as medical consultation and legal advice, where accuracy is paramount. In this paper, we introduce the LLM factoscope, a novel Siamese network-based model that leverages the inner states of LLMs for factual detection. Our investigation reveals distinguishable patterns in LLMs' inner states when generating factual versus non-factual content. We demonstrate the LLM factoscope's effectiveness across various architectures, achieving over 96% accuracy in factual detection. Our work opens a new avenue for utilizing LLMs' inner states for factual detection and encourages further exploration into LLMs' inner workings for enhanced reliability and transparency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Natural language processing: State of the art, current trends and challenges. Multimedia tools and applications, 82(3):3713–3744, 2023.
  2. Llm based generation of item-description for recommendation system. In Proceedings of the 17th ACM Conference on Recommender Systems, RecSys ’23, page 1204–1207, New York, NY, USA, 2023. Association for Computing Machinery.
  3. Siren’s song in the ai ocean: A survey on hallucination in large language models, 09 2023.
  4. Huatuogpt, towards taming language models to be a doctor. arXiv preprint arXiv:2305.15075, 2023.
  5. Chatlaw: Open-source legal large language model with integrated external knowledge bases, 2023.
  6. Improving mathematics tutoring with a code scratchpad. In Ekaterina Kochmar, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Nitin Madnani, Anaïs Tack, Victoria Yaneva, Zheng Yuan, and Torsten Zesch, editors, Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 20–28, Toronto, Canada, July 2023. Association for Computational Linguistics.
  7. Distinguishing fact from fiction: A benchmark dataset for identifying machine-generated scientific papers in the LLM era. In Anaelia Ovalle, Kai-Wei Chang, Ninareh Mehrabi, Yada Pruksachatkun, Aram Galystan, Jwala Dhamala, Apurv Verma, Trista Cao, Anoop Kumar, and Rahul Gupta, editors, Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), pages 190–207, Toronto, Canada, July 2023. Association for Computational Linguistics.
  8. Lie detection and the polygraph: A historical review. The Journal of Forensic Psychiatry & Psychology, 16(2):357–369, 2005.
  9. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  10. Locating and editing factual associations in GPT. Advances in Neural Information Processing Systems, 36, 2022.
  11. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  12. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
  13. The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only. ArXiv, abs/2306.01116, 2023.
  14. Investigating the factual knowledge boundary of large language models with retrieval augmentation. ArXiv, abs/2307.11019, 2023.
  15. Augmented language models: a survey. ArXiv, abs/2302.07842, 2023.
  16. The internal state of an llm knows when its lying. arXiv preprint arXiv:2304.13734, 2023.
  17. Gregory R. Koch. Siamese neural networks for one-shot image recognition. 2015.
  18. Mexwell. Famous paintings dataset. https://www.kaggle.com/datasets/mexwell/famous-paintings.
  19. Daniel Grijalvas. Movies dataset. https://www.kaggle.com/datasets/danielgrijalvas/movies.
  20. Olympic games dataset. https://www.kaggle.com/datasets/the-guardian/olympic-games.
  21. The top paying sports teams and top paid athletes dataset. https://www.kaggle.com/datasets/prashant808/the-top-paying-sports-teams-and-top-paid-athletes.
  22. Goodreads best books dataset. https://www.kaggle.com/datasets/meetnaren/goodreads-best-books.
  23. Wikidata query service. https://query.wikidata.org/.
  24. Pantheon project dataset. https://www.kaggle.com/datasets/mit/pantheon-project.
  25. Nobel laureates dataset. https://www.kaggle.com/datasets/nobelfoundation/nobel-laureates.
  26. Billionaires statistics dataset. https://www.kaggle.com/datasets/nelgiriyewithana/billionaires-statistics-dataset.
  27. Kaggle: Your home for data science. https://www.kaggle.com.
  28. OpenAI. Gpt-4, 2023. https://openai.com/gpt-4.
  29. Mass-editing memory in a transformer. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
  30. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
  31. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Alessandro Moschitti, Bo Pang, and Walter Daelemans, editors, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, October 2014. Association for Computational Linguistics.
  32. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
  33. Language models are unsupervised multitask learners. 2019.
  34. Vicuna: An open-source chatbot impressing gpt-4 with 90
  35. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  36. Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334, 2021.
  37. Open llm leaderboard. https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard, 2023.
  38. Axiomatic attribution for deep networks. In International Conference on Machine Learning, 2017.
  39. Good-looking but lacking faithfulness: Understanding local explanation methods through trend-based testing. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS ’23, page 431–445, New York, NY, USA, 2023. Association for Computing Machinery.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jinwen He (7 papers)
  2. Yujia Gong (2 papers)
  3. Kai Chen (512 papers)
  4. Zijin Lin (5 papers)
  5. Chengan Wei (1 paper)
  6. Yue Zhao (394 papers)
Citations (1)