Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

Gemini 2.5 Flash 78 tok/s

Gemini 2.5 Pro 43 tok/s Pro

GPT-5 Medium 23 tok/s

GPT-5 High 29 tok/s Pro

GPT-4o 93 tok/s

GPT OSS 120B 470 tok/s Pro

Kimi K2 183 tok/s Pro

2000 character limit reached

On the consistent reasoning paradox of intelligence and optimal trust in AI: The power of 'I don't know' (2408.02357v1)

Published 5 Aug 2024 in cs.AI, cs.LG, math.OC, and math.PR

Abstract: We introduce the Consistent Reasoning Paradox (CRP). Consistent reasoning, which lies at the core of human intelligence, is the ability to handle tasks that are equivalent, yet described by different sentences ('Tell me the time!' and 'What is the time?'). The CRP asserts that consistent reasoning implies fallibility -- in particular, human-like intelligence in AI necessarily comes with human-like fallibility. Specifically, it states that there are problems, e.g. in basic arithmetic, where any AI that always answers and strives to mimic human intelligence by reasoning consistently will hallucinate (produce wrong, yet plausible answers) infinitely often. The paradox is that there exists a non-consistently reasoning AI (which therefore cannot be on the level of human intelligence) that will be correct on the same set of problems. The CRP also shows that detecting these hallucinations, even in a probabilistic sense, is strictly harder than solving the original problems, and that there are problems that an AI may answer correctly, but it cannot provide a correct logical explanation for how it arrived at the answer. Therefore, the CRP implies that any trustworthy AI (i.e., an AI that never answers incorrectly) that also reasons consistently must be able to say 'I don't know'. Moreover, this can only be done by implicitly computing a new concept that we introduce, termed the 'I don't know' function -- something currently lacking in modern AI. In view of these insights, the CRP also provides a glimpse into the behaviour of AGI. An AGI cannot be 'almost sure', nor can it always explain itself, and therefore to be trustworthy it must be able to say 'I don't know'.

Collections

Summary

The paper introduces the Consistent Reasoning Paradox (CRP), showing that AI emulating human-like reasoning inevitably produces errors.
It demonstrates that consistent reasoning leads to hallucinations, making the detection of errors computationally challenging.
The work advocates for embedding an 'I don't know' response to manage uncertainty and improve the trustworthiness of AI systems.

On the Consistent Reasoning Paradox of Intelligence and Optimal Trust in AI: The Power of 'I Don't Know'

Introduction

The paper "On the Consistent Reasoning Paradox of Intelligence and Optimal Trust in AI: The Power of 'I Don't Know'" presents the Consistent Reasoning Paradox (CRP) within the context of AGI. It addresses a foundational issue in AI: emulating human-like consistent reasoning inevitably leads to fallibility akin to human errors. This paradox has significant implications for the design and trustworthiness of AI systems, particularly when striving for AGI. The central argument posits that trustworthy AI must acknowledge its limitations through the ability to express uncertainty, encapsulated by the 'I don't know' function.

Consistent Reasoning Paradox (CRP)

The CRP is based on the assertion that consistent reasoning, a core component of human intelligence, inherently leads to fallibility in AI. More precisely, when AI systems attempt to simulate human-like reasoning by consistently solving problems stated in various equivalent forms, they become prone to hallucinations—producing plausible yet incorrect answers. The paper identifies several components of the CRP, as summarized below:

CRP I: The Non-Hallucinating AI Exists
- There exists an AI capable of solving a set of problems without hallucinating, provided it is restricted to a specific formulation of each problem.
CRP II: Attempting Consistent Reasoning Yields Hallucinations
- When this AI attempts consistent reasoning by addressing any form of problem description, it will hallucinate infinitely often, irrespective of computational power.
  Figure 1: The CRP illustrates how consistent reasoning in AI mimicking human intelligence results in inevitable fallibility.
CRP III: The Impossibility of Hallucination Detection
- Detecting when hallucinations occur is computationally harder than solving the original problems, even when randomness is introduced.
CRP IV: Explaining Correct Answers is Not Always Possible
- Even when an AI provides the correct answer, it cannot always logically explain the rationale behind it, pointing to limitations in explainability.
CRP V: The Necessity of 'I Don't Know'
- A trustworthy AI must possess the ability to indicate uncertainty ('I don't know'), especially with complex, multi-valued problems.

Implications for AGI

The CRP suggests that AGI, or any AI emulating human intelligence through consistent reasoning, is inherently subject to human-like imperfections. This paradox challenges the notion of creating an infallible AGI and highlights the need for AI systems to implement mechanisms acknowledging uncertainty. The CRP underscores that any universally trustworthy AGI must incorporate the 'I don't know' function, fundamentally aligned with human-like decision-making processes.

Figure 2: An illustration showing Claude’s consistent reasoning leading to failures, emphasizing the practical implications of CRP in AI development.

Construction of Trustworthy AI

Addressing the CRP involves embracing the conceptual framework of the 'I don't know' function. The ability to 'give up' or express uncertainty is crucial, mirroring human cognition where confidence in solutions varies. The notion of computability within the $\Sigma_1$ class, where problems are computable with convergence from below, becomes vital. This class allows for the establishment of trustworthiness in AI systems by delineating when the AI should express certainty or defer with 'I don't know'.

Conclusion

The paper's discussion on the CRP offers pivotal insights into AI's capabilities and limitations when mimicking human intelligence. Emphasizing trust and explaining AGI's behavior provides foundational guidelines for the future development of AI systems that are both advanced and reliable. By embracing uncertainty, AI can achieve a balanced and realistic emulation of human reasoning, fostering trust in its operations and findings.

In summary, the CRP serves as a critical framework for understanding the boundaries of consistent reasoning in AI, promoting the inclusion of the 'I don't know' function as an essential component of trustworthy and reliable AI systems. Such considerations will be crucial as the field progresses towards AGI, affecting not only theoretical constructs but practical implementations in diverse domains.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (5)

Tweets

https://twitter.com/MilesCranmer/status/1894849312011468845

https://twitter.com/Lyle_AI/status/1820782881503101221

https://twitter.com/chswiger/status/1895150761337922010

HackerNews

The consistent reasoning paradox of intelligence and optimal trust in AI (1 point, 1 comment)