Do Large Language Models Know What They Don't Know? (2305.18153v2)

Published 29 May 2023 in cs.CL

Abstract: LLMs have a wealth of knowledge that allows them to excel in various NLP tasks. Current research focuses on enhancing their performance within their existing knowledge. Despite their vast knowledge, LLMs are still limited by the amount of information they can accommodate and comprehend. Therefore, the ability to understand their own limitations on the unknows, referred to as self-knowledge, is of paramount importance. This study aims to evaluate LLMs' self-knowledge by assessing their ability to identify unanswerable or unknowable questions. We introduce an automated methodology to detect uncertainty in the responses of these models, providing a novel measure of their self-knowledge. We further introduce a unique dataset, SelfAware, consisting of unanswerable questions from five diverse categories and their answerable counterparts. Our extensive analysis, involving 20 LLMs including GPT-3, InstructGPT, and LLaMA, discovering an intrinsic capacity for self-knowledge within these models. Moreover, we demonstrate that in-context learning and instruction tuning can further enhance this self-knowledge. Despite this promising insight, our findings also highlight a considerable gap between the capabilities of these models and human proficiency in recognizing the limits of their knowledge.

PDF Abstract

Do LLMs Know What They Don’t Know?

The investigation of self-awareness in LLMs has become an essential aspect of evaluating their robustness and reliability across various NLP tasks. This paper explores a critical facet of artificial intelligence: the capacity for LLMs to recognize and articulate their limitations—specifically, their ability to ascertain unanswerable or unknowable questions, termed as model self-knowledge. The authors introduce an automated method to assess this self-knowledge by inspecting the uncertainty in model responses and propose a dedicated dataset named SelfAware, comprised of unanswerable questions from five distinct categories.

Methodological Approach

To systematically evaluate LLMs, the research employs an amalgam of automated uncertainty detection and text similarity functions, leading to quantitative measurements of self-knowledge using the F1 score. By incorporating a diverse array of unanswerable queries, the SelfAware dataset represents a significant augmentation over existing dataset landscapes. The dataset is structured to yield a more profound understanding of LLMs' capacity to discern answerable from unanswerable inputs, facilitating assessments grounded in real-world ambiguities wherein human judgment is required.

Empirical Evaluation

The empirical analysis focuses on 20 LLMs, including notable architectures such as GPT-3, InstructGPT, and LLaMA. The paper reveals that the intrinsic capacity for self-knowledge varies among these models, with enhancements observable through in-context learning and instruction tuning methods. For instance, current state-of-the-art models like GPT-4 displayed self-knowledge at 75.47%, a notable yet suboptimal level compared to human benchmarks. The findings demonstrate a significant gap between models and humans in the precise recognition of self-knowledge boundaries.

Key observations include:

Model Size Correlation: An increase in model size correlates with improved self-knowledge across all input forms.
Instruction Tuning Impact: Instruction tuning considerably enhances the self-knowledge capability in models, as seen with the InstructGPT series and LLaMA-derived models.
Input Form Influence: Richer context through instruction and example inclusion ameliorates models' self-awareness capabilities, evidenced by elevated performance metrics within the ICL input forms.

Implications and Future Directions

The insights derived from this paper spotlight a crucial area for advancement in LLMs—enhancing their self-awareness to facilitate responsible AI deployment across sensitive applications where certainty remains paramount. The methodology and dataset introduced provide a foundational platform for continuous learning and adaptive refinement in model architectures.

Theoretical implications suggest that bolstered self-knowledge would lend credibility and reliability to LLMs in interfacing with humans across diverse sectors, including education, intelligence assessment, and areas requiring sensitive decision-making. Practically, this body of work will drive progress towards more self-aware AI systems, pivotal in the endeavor to emulate human-like understanding and introspection.

Future research could tackle automated reference sentence acquisition and extend input form investigations to incorporate innovative cognitive strategies. As this domain advances, the strides toward addressing these limitations will profoundly influence the trajectory of AI as it aligns more closely with human cognitive processes.