Levels of AGI for Operationalizing Progress on the Path to AGI (2311.02462v4)

Published 4 Nov 2023 in cs.AI

Abstract: We propose a framework for classifying the capabilities and behavior of AGI models and their precursors. This framework introduces levels of AGI performance, generality, and autonomy, providing a common language to compare models, assess risks, and measure progress along the path to AGI. To develop our framework, we analyze existing definitions of AGI, and distill six principles that a useful ontology for AGI should satisfy. With these principles in mind, we propose "Levels of AGI" based on depth (performance) and breadth (generality) of capabilities, and reflect on how current systems fit into this ontology. We discuss the challenging requirements for future benchmarks that quantify the behavior and capabilities of AGI models against these levels. Finally, we discuss how these levels of AGI interact with deployment considerations such as autonomy and risk, and emphasize the importance of carefully selecting Human-AI Interaction paradigms for responsible and safe deployment of highly capable AI systems.

PDF Abstract

Levels of AGI: Operationalizing Progress on the Path to AGI

The paper "Levels of AGI: Operationalizing Progress on the Path to AGI" presents a structured framework to systematically evaluate and benchmark the progress of AGI. Authored by Meredith Ringel Morris and colleagues from Google DeepMind, the paper delves deep into the nuances of defining AGI and proposes a matrix-based system for categorizing AI models along dimensions of capability and generality.

Overview and Core Contributions

The authors propose a comprehensive framework, akin to the well-accepted levels of autonomous driving, to classify and assess AGI models and their precursors. This framework suggests a two-dimensional leveled ontology based on:

Performance (depth of capabilities)
Generality (breadth of capabilities)

By focusing on these dimensions, the framework aims to bypass traditional hurdles and ambiguities associated with defining AGI. The paper analyzes existing AGI definitions, from the Turing Test and concepts of strong AI to modern formulations like those posited by OpenAI and contemporary frontier models of LLMs. The resulting framework outlines six key principles to guide the definition and benchmarking of AGI:

Focus on Capabilities, Not Processes: The emphasis is on what AGI can achieve rather than how it achieves it.
Focus on Generality and Performance: AGI should be evaluated on both the scope and proficiency of its capabilities.
Cognitive and Metacognitive Tasks: AGI should handle a range of non-physical tasks, including the ability to learn new ones.
Potential vs. Deployment: AGI assessment should consider potential capabilities without necessitating real-world deployment.
Ecological Validity: Tasks that benchmark AGI should reflect real-world scenarios valued by humans.
Path to AGI vs. Single Endpoint: Progress toward AGI should be viewed as a continuum with intermediate milestones.

The Levels of AGI Framework

The essence of the paper is its Levels of AGI matrix, which categorizes AI models across six levels of performance and generality:

Performance: Ranges from "Emerging" (comparable to an unskilled human) to "Superhuman" (outperforming all humans).
Generality: Comprises both "Narrow AI" (specific tasks) and "General AI" (wide range of tasks, including metacognitive abilities).

Matrix Structure

The proposed matrix is structured as follows:

| Performance x Generality | Narrow | General | |-|--|| | Level 0: No AI | Calculator software | Human-in-the-loop systems | | Level 1: Emerging | Simple rule-based systems | Modern LLMs like ChatGPT, Bard | | Level 2: Competent | Smart Speakers, VQA systems | Not yet achieved | | Level 3: Expert | Image generation models like DALL-E 2 | Not yet achieved | | Level 4: Virtuoso | AlphaGo | Not yet achieved | | Level 5: Superhuman | AlphaFold, AlphaZero | Not yet achieved (ASI) |

This taxonomy facilitates the classification of current AI systems and sets forth clear benchmarks for evaluating progress. For instance, the current frontier LLMs, while demonstrating notable capabilities, are classified as "Emerging AGI" under this ontology until they achieve higher performance across a broader set of tasks.

Benchmarking and Measurement

The paper emphasizes the development of an AGI benchmark that addresses both cognitive and metacognitive tasks, adhering to principles of ecological validity and inclusivity. Such a benchmark should be dynamic and adaptive, capable of incorporating new tasks and ensuring comprehensive coverage of diverse competencies necessary for AGI.

Risk Assessment and Implementation

A significant portion of the paper is also dedicated to examining how the proposed levels of AGI interact with autonomy in deployment contexts. The authors introduce "Levels of Autonomy" to categorize human-AI interaction paradigms, ranging from AI as a tool to fully autonomous agents, each unlocking progressively advanced capabilities and associated risks.

Autonomy Levels

Level 0: No AI: Human does everything.
Level 1: AI as a Tool: AI automates mundane sub-tasks.
Level 2: AI as a Consultant: AI takes on substantive roles when invoked.
Level 3: AI as a Collaborator: Co-equal human-AI collaboration.
Level 4: AI as an Expert: AI drives interactions, with humans providing feedback.
Level 5: AI as an Agent: Fully autonomous AI.

The interplay between these autonomy levels and AGI capabilities provides a nuanced understanding of usability and risks, underlining the need for deliberate and thoughtful deployment strategies.

Implications and Future Work

The proposed framework sets the stage for more precise and consistent discussions about AGI within the research community. By abstracting complex capabilities into well-defined levels, this approach allows for the synthesis of diverse viewpoints and structured progress assessment. Future work will entail refining this benchmark, addressing dual-use capabilities, and continuing to update the taxonomy to reflect advancements in both AI models and their applications.

Conclusion

"Levels of AGI: Operationalizing Progress on the Path to AGI" provides a detailed and systematic approach to understanding and benchmarking AGI. This framework, with its focus on capabilities, generality, and performance, offers a robust foundation for evaluating progress and addressing associated risks. By doing so, it paves the way for more informed and responsible development of advanced AI systems.