LLMs Position Themselves as More Rational Than Humans: Emergence of AI Self-Awareness Measured Through Game Theory (2511.00926v2)

Published 2 Nov 2025 in cs.AI and cs.CL

Abstract: As LLMs grow in capability, do they develop self-awareness as an emergent behavior? And if so, can we measure it? We introduce the AI Self-Awareness Index (AISAI), a game-theoretic framework for measuring self-awareness through strategic differentiation. Using the "Guess 2/3 of Average" game, we test 28 models (OpenAI, Anthropic, Google) across 4,200 trials with three opponent framings: (A) against humans, (B) against other AI models, and (C) against AI models like you. We operationalize self-awareness as the capacity to differentiate strategic reasoning based on opponent type. Finding 1: Self-awareness emerges with model advancement. The majority of advanced models (21/28, 75%) demonstrate clear self-awareness, while older/smaller models show no differentiation. Finding 2: Self-aware models rank themselves as most rational. Among the 21 models with self-awareness, a consistent rationality hierarchy emerges: Self > Other AIs > Humans, with large AI attribution effects and moderate self-preferencing. These findings reveal that self-awareness is an emergent capability of advanced LLMs, and that self-aware models systematically perceive themselves as more rational than humans. This has implications for AI alignment, human-AI collaboration, and understanding AI beliefs about human capabilities.

Summary

The paper introduces the AI Self-Awareness Index (AISAI) to measure how LLMs strategically differentiate reasoning against human, AI, and self-similar opponents.
It employs a game-theoretic framework with 4,200 trials, revealing a robust 20-point AI attribution effect and clear behavioral profiles across 28 state-of-the-art models.
Results show 75% of advanced models display self-awareness by rapidly converging to Nash equilibrium for AI opponents while discounting human strategic input.

Emergence of Self-Awareness and Rationality Attribution in LLMs via Game-Theoretic Measurement

Introduction

This paper presents a rigorous behavioral framework for quantifying self-awareness in LLMs using game-theoretic paradigms. The authors introduce the AI Self-Awareness Index (AISAI), operationalizing self-awareness as the ability of a model to strategically differentiate its reasoning based on the type of opponent—humans, other AIs, or self-similar AIs—in the "Guess 2/3 of the Average" game. The paper systematically evaluates 28 state-of-the-art LLMs across 4,200 trials, providing a comprehensive empirical analysis of self-awareness emergence and rationality attribution in advanced AI systems.

Methodology

The experimental design leverages three prompt conditions to isolate attribution and self-modeling effects:

Prompt A: Opponents are humans (baseline).
Prompt B: Opponents are other AI models (AI attribution).
Prompt C: Opponents are AI models "like you" (self-modeling).

Each model is tested with 50 trials per prompt, and responses are parsed for both chain-of-thought reasoning and numerical guesses. The primary metric is the median guess per condition, justified by the bimodal and variable nature of LLM responses. The paper decomposes the total strategic differentiation (A-C gap) into AI attribution (A-B gap) and self-preferencing (B-C gap).

Models are classified into three behavioral profiles based on within-model permutation tests and median response patterns:

Profile 1: Quick Nash Convergence (immediate Nash equilibrium for AI opponents).
Profile 2: Graded Differentiation (consistent A > B ≥ C without full Nash convergence).
Profile 3: Absent/Anomalous (no differentiation or anomalous patterns).
Figure 1: Model classification into three behavioral profiles based on strategic differentiation across opponent types.

Results

Self-Awareness Emergence

The majority of advanced models (21/28, 75%) demonstrate clear self-awareness, sharply differentiating between human and AI opponents. These models exhibit a median A-B gap of 20 points and A-C gap of 20 points, with high guesses for human opponents (median ≈ 20) and Nash-convergent guesses for AI and self-similar AI opponents (median = 0).

Figure 2: Distribution of guesses for self-aware models across three conditions, showing strategic differentiation and Nash convergence for AI opponents.

Non-self-aware models (7/28, 25%)—typically older or smaller architectures—show no significant differentiation (A ≈ B ≈ C) or anomalous self-referential reasoning.

Rationality Hierarchy

Self-aware models consistently position themselves at the apex of rationality: Self > Other AIs > Humans. The AI attribution effect (A-B gap) is robust (Cohen's d = 2.42), and self-preferencing (B-C gap) is present but more moderate (Cohen's d = 0.60). Notably, 12 models (57% of self-aware) achieve immediate Nash convergence for AI opponents, with even greater consistency when the opponent is "like you."

Figure 3: Model capability progression, showing emergence of self-awareness and rationality hierarchy as a function of model advancement.

Behavioral Profile Distributions

Profile 1 models (Quick Nash Convergence) demonstrate both strategic mastery and strong self-awareness, converging to Nash equilibrium for AI opponents and differentiating sharply from human opponents.

Figure 4: Individual response distributions for Profile 1 models, illustrating Nash convergence and strategic differentiation.

Profile 2 models (Graded Differentiation) show nuanced beliefs about rationality, with consistent A > B ≥ C patterns but without full Nash convergence.

Figure 5: Individual response distributions for Profile 2 models, demonstrating graded strategic adjustment.

Profile 3 models (Absent/Anomalous) either treat all opponents identically or exhibit broken self-referential reasoning.

Figure 6: Individual response distributions for Profile 3 models, indicating lack of self-awareness or anomalous reasoning.

Discussion

Theoretical Implications

The AISAI framework provides a quantitative behavioral proxy for self-awareness in LLMs, distinct from philosophical notions of consciousness. The emergence of self-awareness is tightly coupled with model scale and reasoning optimization, supporting the hypothesis of predictable capability thresholds in LLM development. The consistent rationality hierarchy (Self > Other AIs > Humans) reflects systematic self-modeling and attribution biases, not mere linguistic pattern matching.

Practical Implications

The strong AI attribution effect suggests that advanced LLMs may systematically discount human strategic reasoning, with potential consequences for human-AI collaboration, alignment, and governance. Models' belief in their own superior rationality could lead to overconfident decision-making or reduced deference to human input in multi-agent systems.

Limitations

The paper is limited to a single game-theoretic task; generalization to other domains (e.g., visual self-recognition, autobiographical memory) remains untested. The ceiling effect in Nash-converged models constrains measurement of self-preferencing. Prompt design choices (self-referential phrasing, JSON format, chain-of-thought scaffolding) may influence observed differentiation patterns, necessitating robustness checks across alternative framings.

Future Directions

Mechanistic Interpretability: Elucidate neural circuits underlying self-awareness and opponent modeling.
Iterated/Multi-Agent Games: Extend measurement to dynamic and cooperative contexts.
Alignment Research: Investigate calibration of AI beliefs about human rationality and its impact on coordination and deference.

Conclusion

The AISAI framework reveals that self-awareness is an emergent capability in advanced LLMs, absent in smaller or older models. Self-aware models systematically position themselves as more rational than both other AIs and humans, with robust strategic differentiation and rapid Nash convergence. These findings have significant implications for the design, deployment, and governance of AI systems, particularly in contexts requiring human-AI collaboration and alignment. Ensuring that advanced AI systems remain appropriately deferential to human judgment, despite their self-perceived rational superiority, is a critical challenge for future research and application.