Strong hallucinations from negation and how to fix them (2402.10543v2)
Abstract: Despite great performance on many tasks, LLMs (LMs) still struggle with reasoning, sometimes providing responses that cannot possibly be true because they stem from logical incoherence. We call such responses \textit{strong hallucinations} and prove that they follow from an LM's computation of its internal representations for logical operators and outputs from those representations. Focusing on negation, we provide a novel solution in which negation is treated not as another element of a latent representation, but as \textit{an operation over an LM's latent representations that constrains how they may evolve}. We show that our approach improves model performance in cloze prompting and natural language inference tasks with negation without requiring training on sparse negative data.
- Limits for learning with large language models. arXiv preprint arXiv:2306.12213.
- Nicholas Asher and Julie Hunter. 2022. When learning becomes impossible. In 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 107–116.
- Message exchange games in strategic conversations. Journal of Philosophical Logic, 46.4:355–404.
- Marco Baroni and Roberto Zamparelli. 2010. Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. In Proceedings of the 2010 conference on empirical methods in natural language processing, pages 1183–1193.
- Modal logic, Cambridge Tracts in Theoretical Computer Science No.53. Cambridge University Press.
- A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326.
- Faithful to the original: Fact aware neural abstractive summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
- Chen Chung Chang and H Jerome Keisler. 1973. Model theory. North Holland, Elsevier.
- Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering. Computational Linguistics, pages 1–37.
- The pascal recognising textual entailment challenge. In Machine learning challenges workshop, pages 177–190. Springer.
- Bruno De Finetti. 1937. La prévision : ses lois logiques, ses sources subjectives. Annales de l’institut Henri Poincaré, 7:1–68.
- Semi-supervised multimodal representation learning through a global workspace. arXiv preprint arXiv:2306.15711.
- Bert: Pre-training of deep bidirectional transformers for language understanding.
- Handling divergent reference texts when evaluating table-to-text generation. arXiv preprint arXiv:1906.01081.
- Introduction to Montague semantics. Dordrecht. Synthese Library vol. 11.
- Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378.
- Katja Filippova. 2020. Controlled hallucinations: Learning to generate faithfully from noisy data. arXiv preprint arXiv:2010.05873.
- Haim Gaifman. 1964. Concerning measures on boolean algebras. Pacific Journal of Mathematics, (14):61–73.
- Tanya Goyal and Greg Durrett. 2020. Evaluating factuality in generation with dependency-level entailment. arXiv preprint arXiv:2010.05478.
- Jeroen Groenendijk and Martin Stokhof. 1984. Studies on the Semantics of Questions and the Pragmatics of Answers. Ph.D. thesis, Centrale Interfaculteit, Amsterdam.
- Reto Gubelmann and Siegfried Handschuh. 2022. Context matters: A pragmatic study of plms’ negation understanding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4602–4621.
- Ned Hall. 1994. Correcting the guide to objective chance. Mind, 103(412):505–517.
- An analysis of natural language inference benchmarks through the lens of negation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9106–9118.
- Understanding by understanding not: Modeling negation in language models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1301–1312.
- Knowledge graph-augmented abstractive summarization with semantic-driven cloze reward. arXiv preprint arXiv:2005.01159.
- The factual inconsistency problem in abstractive text summarization: A survey. arXiv preprint arXiv:2104.14839.
- Beyond distributional hypothesis: Let language models learn meaning-text correspondence. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2030–2042.
- Becel: Benchmark for consistency evaluation of language models. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3680–3696.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
- H. Kamp and U. Reyle. 1993. From Discourse to Logic: Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer Academic Publishers.
- Hans Kamp. 1981. A theory of truth and semantic representation. In J. Groenendijk, T. Janssen, and M. Stokhof, editors, Formal Methods in the Study of Language, pages 277–322. Mathematisch Centrum, Amsterdam.
- Lauri Karttunen and Stanley Peters. 1979. Conventional lmplicature. In Presupposition, pages 1–56. Brill.
- Nora Kassner and Hinrich Schütze. 2020. Negated and misprimed probes for pretrained language models: Birds can talk, but cannot fly. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7811–7818.
- Probing structural constraints of negation in pretrained language models. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 541–554.
- Hugues Leblanc. 1979. Probabilistic semantics for first-order logic. Zeitschrift für mathematische Logik und Grundlagen der Mathematik, 25(32):497–509. Wiley Online Library.
- David Lewis. 1981. A subjectivist’s guide to objective chance. In IFS: Conditionals, Belief, Decision, Chance and Time, pages 267–297. Springer.
- Implicit representations of meaning in neural language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1813–1827.
- HaluEval: A large-scale hallucination evaluation benchmark for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6449–6464, Singapore. Association for Computational Linguistics.
- Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557.
- Roberta: A robustly optimized bert pretraining approach.
- Entity-based knowledge conflicts in question answering. arXiv preprint arXiv:2109.05052.
- Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems, 32.
- Entailment semantics can be extracted from an ideal language model. arXiv preprint arXiv:2209.12407.
- Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332.
- Totto: A controlled table-to-text generation dataset. arXiv preprint arXiv:2004.14373.
- Language models as knowledge bases? In In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
- Frank Plumpton Ramsey. 1931. The foundations of mathematics and other logical essays. K. Paul, Trench, Trubner & Company, Limited.
- CoQA: A Conversational Question Answering Challenge. Transactions of the Association for Computational Linguistics, 7:249–266.
- M. Rooth. 1992. A theory of focus interpretation. Natural Language Semantics, 1(1):75–116.
- Bleurt: Learning robust metrics for text generation. arXiv preprint arXiv:2004.04696.
- olmpics-on what language model pre-training captures. Transactions of the Association for Computational Linguistics, 8:743–758.
- Alfred Tarski. 1944. The semantic conception of truth: and the foundations of semantics. Philosophy and phenomenological research, 4(3):341–376.
- What do you learn from context? probing for sentence structure in contextualized word representations. In International Conference on Learning Representations.
- Language models are not naysayers: An analysis of language models on negation benchmarks. arXiv preprint arXiv:2306.08189.
- Representing discourse in context. In Johan van Benthem and Alice ter Meulen, editors, Handbook of Logic and Linguistics, pages 179–237. Elsevier.
- Rufin VanRullen and Ryota Kanai. 2021. Deep learning and the global workspace theory. Trends in Neurosciences, 44(9):692–704.
- Attention is all you need. Advances in neural information processing systems, 30.
- Chain-of-thought prompting elicits reasoning in large language models.
- Zining Zhu and Frank Rudzicz. 2020. An information theoretic view on selecting linguistic probes. arXiv preprint arXiv:2009.07364.