Levels of AGI: Operationalizing Progress on the Path to AGI
The paper "Levels of AGI: Operationalizing Progress on the Path to AGI" presents a structured framework to systematically evaluate and benchmark the progress of AGI. Authored by Meredith Ringel Morris and colleagues from Google DeepMind, the paper delves deep into the nuances of defining AGI and proposes a matrix-based system for categorizing AI models along dimensions of capability and generality.
Overview and Core Contributions
The authors propose a comprehensive framework, akin to the well-accepted levels of autonomous driving, to classify and assess AGI models and their precursors. This framework suggests a two-dimensional leveled ontology based on:
- Performance (depth of capabilities)
- Generality (breadth of capabilities)
By focusing on these dimensions, the framework aims to bypass traditional hurdles and ambiguities associated with defining AGI. The paper analyzes existing AGI definitions, from the Turing Test and concepts of strong AI to modern formulations like those posited by OpenAI and contemporary frontier models of LLMs. The resulting framework outlines six key principles to guide the definition and benchmarking of AGI:
- Focus on Capabilities, Not Processes: The emphasis is on what AGI can achieve rather than how it achieves it.
- Focus on Generality and Performance: AGI should be evaluated on both the scope and proficiency of its capabilities.
- Cognitive and Metacognitive Tasks: AGI should handle a range of non-physical tasks, including the ability to learn new ones.
- Potential vs. Deployment: AGI assessment should consider potential capabilities without necessitating real-world deployment.
- Ecological Validity: Tasks that benchmark AGI should reflect real-world scenarios valued by humans.
- Path to AGI vs. Single Endpoint: Progress toward AGI should be viewed as a continuum with intermediate milestones.
The Levels of AGI Framework
The essence of the paper is its Levels of AGI matrix, which categorizes AI models across six levels of performance and generality:
- Performance: Ranges from "Emerging" (comparable to an unskilled human) to "Superhuman" (outperforming all humans).
- Generality: Comprises both "Narrow AI" (specific tasks) and "General AI" (wide range of tasks, including metacognitive abilities).
Matrix Structure
The proposed matrix is structured as follows:
| Performance x Generality | Narrow | General | |-|--|| | Level 0: No AI | Calculator software | Human-in-the-loop systems | | Level 1: Emerging | Simple rule-based systems | Modern LLMs like ChatGPT, Bard | | Level 2: Competent | Smart Speakers, VQA systems | Not yet achieved | | Level 3: Expert | Image generation models like DALL-E 2 | Not yet achieved | | Level 4: Virtuoso | AlphaGo | Not yet achieved | | Level 5: Superhuman | AlphaFold, AlphaZero | Not yet achieved (ASI) |
This taxonomy facilitates the classification of current AI systems and sets forth clear benchmarks for evaluating progress. For instance, the current frontier LLMs, while demonstrating notable capabilities, are classified as "Emerging AGI" under this ontology until they achieve higher performance across a broader set of tasks.
Benchmarking and Measurement
The paper emphasizes the development of an AGI benchmark that addresses both cognitive and metacognitive tasks, adhering to principles of ecological validity and inclusivity. Such a benchmark should be dynamic and adaptive, capable of incorporating new tasks and ensuring comprehensive coverage of diverse competencies necessary for AGI.
Risk Assessment and Implementation
A significant portion of the paper is also dedicated to examining how the proposed levels of AGI interact with autonomy in deployment contexts. The authors introduce "Levels of Autonomy" to categorize human-AI interaction paradigms, ranging from AI as a tool to fully autonomous agents, each unlocking progressively advanced capabilities and associated risks.
Autonomy Levels
- Level 0: No AI: Human does everything.
- Level 1: AI as a Tool: AI automates mundane sub-tasks.
- Level 2: AI as a Consultant: AI takes on substantive roles when invoked.
- Level 3: AI as a Collaborator: Co-equal human-AI collaboration.
- Level 4: AI as an Expert: AI drives interactions, with humans providing feedback.
- Level 5: AI as an Agent: Fully autonomous AI.
The interplay between these autonomy levels and AGI capabilities provides a nuanced understanding of usability and risks, underlining the need for deliberate and thoughtful deployment strategies.
Implications and Future Work
The proposed framework sets the stage for more precise and consistent discussions about AGI within the research community. By abstracting complex capabilities into well-defined levels, this approach allows for the synthesis of diverse viewpoints and structured progress assessment. Future work will entail refining this benchmark, addressing dual-use capabilities, and continuing to update the taxonomy to reflect advancements in both AI models and their applications.
Conclusion
"Levels of AGI: Operationalizing Progress on the Path to AGI" provides a detailed and systematic approach to understanding and benchmarking AGI. This framework, with its focus on capabilities, generality, and performance, offers a robust foundation for evaluating progress and addressing associated risks. By doing so, it paves the way for more informed and responsible development of advanced AI systems.