Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
51 tokens/sec
2000 character limit reached

Levels of AGI Framework

Updated 29 July 2025
  • Levels of AGI Framework is a taxonomy that classifies AI systems based on performance, generality, and trustworthiness.
  • It employs matrixed ontologies, causal diagrams, and hierarchical layering to evaluate tasks from narrow automation to superhuman proficiency.
  • These frameworks drive policy and safety strategies by linking technical benchmarks with risk assessment and economic modeling.

AGI refers to artificial agents capable of achieving competence at a wide array of tasks with performance and versatility rivaling or surpassing that of humans. Frameworks for articulating “levels of AGI” have become central to operationalizing progress, benchmarking risks, analyzing safety, and guiding both research and policy. These frameworks analyze dimensions such as performance depth, breadth of generality, autonomy, capability, alignment, trustworthiness, sociotechnical risk, and deployment scenarios. They draw upon formal ontologies, hierarchical taxonomies, abstraction levels, economic models, and governance principles.

1. Conceptual and Ontological Foundations

The classification of AGI levels requires crisp definitions grounded in observable capabilities rather than implementation details or anthropomorphic features. A well-founded AGI ontology must satisfy the following requirements (Morris et al., 2023):

  • Focus on Capabilities: Benchmark observable achievements, not underlying algorithms or “consciousness.”
  • Dual Axes—Performance and Generality: Use "depth" (how well tasks are performed relative to human benchmarks) and "breadth" (the diversity of tasks an AI system can master).
  • Cognitive and Metacognitive Focus: Evaluate reasoning, learning, and the capacity to acquire new skills, rather than physical embodiment.
  • Potential, Not Deployment: Assign credit for capability-in-principle; legal or economic deployment barriers are treated separately.
  • Ecological Validity: Employ tasks reflective of real-world values and challenges, not abstract puzzles.
  • Gradualist Path: Articulate AGI as a series of levels, enabling incremental measurement and regulation.

A representative ontology is a matrix with columnar “generality” (ranging from narrow to general capabilities) and ordinal “performance” levels (from below human up to superhuman) (Morris et al., 2023); this facilitates nuanced technical, policy, and risk assessments.

2. Taxonomies and Hierarchical Level Systems

Multiple taxonomic and hierarchical frameworks have emerged to encode AGI progress:

Depth–Breadth Matrix (Ontology-Based)

One core model, as defined in (Morris et al., 2023), organizes AGI systems by two axes:

  • Breadth (columns): From “narrow” (single, well-defined tasks) to “general” (across cognitive/metacognitive domains).
  • Performance (rows):
    • Level 0: No AI (basic automation only)
    • Level 1: Emerging (unskilled/specialized human level)
    • Level 2: Competent (median skilled adult, e.g., modern voice assistants)
    • Level 3: Expert (≥90th percentile human, e.g., advanced grammar checkers)
    • Level 4: Virtuoso (≈99th percentile, e.g., AlphaGo)
    • Level 5: Superhuman (blatantly exceeding the best humans, e.g., AlphaFold in protein folding)

Embodied AGI Levels

“Toward Embodied AGI” (Wang et al., 20 May 2025) introduces a five-level taxonomy for robotic and general embodied systems: 1. L1: Single-task completion (e.g., pick-and-place robots) 2. L2: Composition of tasks (predefined skill libraries) 3. L3: Conditional general-purpose task performance (adaptable, multimodal, diverse instructions) 4. L4: High-level generalization (robust to open environments; world model integration) 5. L5: All-purpose, lifelong-learning, fully general robots (self-awareness, continual skill acquisition, ethical self-regulation)

Multimodal Generalist Framework (General-Level)

The General-Level framework (Fei et al., 7 May 2025) for Multimodal LLMs (MLLMs) recognizes a five-level hierarchy based on both the number of tasks/modalities supported and the degree of cross-task/cross-modality “synergy.” Level 5 requires that joint multimodal training not only supports all tasks, but outperforms specialist models due to integrative learning.

Trustworthiness Causal Ladder

The AI-45° Law and associated “Causal Ladder of Trustworthy AGI” (Yang et al., 8 Dec 2024) introduces a five-layer schema for trustworthiness: 1. Perception trustworthiness 2. Reasoning trustworthiness 3. Decision-making trustworthiness 4. Autonomy trustworthiness 5. Collaboration trustworthiness Levels correspond to depth in assuring reliability, transparency, and safety for each aspect of system capability.

3. Methodologies for Stratification and Comparison

Levels-of-AGI frameworks leverage various modeling strategies:

  • Matrixed Ontologies: Capabilities are arrayed in two dimensions—task diversity (generality) versus skill percentile (performance)—enabling assignment of current systems to nuanced cells (e.g., “Level 2 General” versus “Level 3 Narrow”).
  • Causal Influence Diagrams (CID): Graphical encoding of agent-environment interactions, states, actions, and rewards to visualize and compare causal structures and incentive divergence (e.g., wireheading, reward misspecification) (Everitt et al., 2019).
  • Hierarchical Layering: Layering abstraction levels (sensor data to symbolic concepts to metacognitive goals) to explicitly model information preservation and transfer (Latapie et al., 2020, Sukhobokov et al., 11 Jan 2024).
  • Benchmark-Linked Levels: Rigorous benchmark suites (e.g., AGITB (Šprogar, 6 Apr 2025), General-Bench (Fei et al., 7 May 2025)), employing signal-level and multimodal tasks, are used to operationalize the frameworks, demanding that models pass all subtasks at a prescribed level to qualify.
  • Synergy Assessment: Explicitly requires a generalist model to outperform or match specialists across all modalities/tasks—penalizing imbalances and rewarding integrative learning effects (e.g., using harmonic mean-based formulas, S₄ = (2 S_C S_G)/(S_C + S_G)) (Fei et al., 7 May 2025).

4. Implications for Safety, Alignment, and Governance

Frameworks for AGI levels directly impact safety analysis and governance:

  • Risk Stratification: As systems progress to higher capability and generality, new risks emerge regarding misalignment, recalcitrant instrumental goals, and potential for catastrophic side effects (Everitt et al., 2018).
  • Trustworthy AGI: The Causal Ladder (Yang et al., 8 Dec 2024) mandates synchronization between capabilities and safety, with the AI-45° Law prescribing balanced progress and flagging existential risks when capability outpaces protective measures.
  • Game-Theoretic Policy Models: Levels of AGI map onto game-theoretic analyses of inter-actor risk (e.g., the “steering wheel problem”) (Young, 25 Jan 2025). Here, frameworks for cooperative equilibrium, network effects in safety investment, and policy mechanisms (preregistration, shared infrastructure) are devised to maintain robust development trajectories.
  • Socioeconomic Models: Levels frameworks are mapped to economic production models, predicting labor disruption and suggesting interventions (redistribution, taxation, new social contracts) as AGI ascends through “levels” of economic substitution and dominance (Stiefenhofer, 10 Feb 2025).
  • Operational Safety Cases: For every significant step upward (toward Level 3–5), safety cases comprising capability evaluations, monitoring, alignment protocols, and system-level security must be constructed (Shah et al., 2 Apr 2025).

5. Benchmarking, Evaluation, and Practical Usage

Operationalizing AGI progress requires systematic evaluation:

  • Breadth–Depth Benchmarks: Benchmarks must span diverse, ecologically valid tasks; “living” benchmarks evolve as systems improve, driving continual reassessment of level placement (Morris et al., 2023).
  • Signal-Level Testbeds: AGITB (Šprogar, 6 Apr 2025) mandates that AGI systems pass core computational invariants (determinism, sensitivity, generalization) without pretraining or symbolic shortcutting, exposing current shortfalls in adaptation and low-level processing.
  • Synergy and Balanced Progress: General-Level (Fei et al., 7 May 2025) explicitly penalizes systems with capability gaps—showing that most state-of-the-art models remain stuck below Level 4 due to lack of cross-modality integration, especially in generation tasks.
  • Deployment/Autonomy Axes: Frameworks distinguish “capability levels” from “deployment autonomy”: high-capability AGI may still be deployed very conservatively, with limited autonomy, as a risk mitigation or policy choice (Morris et al., 2023).

6. Limitations and Ongoing Challenges

Despite robust formalism, existing level frameworks face unresolved technical, theoretical, and societal challenges:

  • Benchmarks generally reflect non-embodied, language-centric cognition; embodied AGI level taxonomies (Wang et al., 20 May 2025) show that full-spectrum integration and lifelong learning are currently aspirational.
  • Cross-modal synergy and real-time responsiveness remain open bottlenecks: frameworks consistently indicate that current MLLMs and LLMs fail to achieve the integrative, reflexive, and generative blend required at higher AGI levels.
  • Economic and policy alignment lags technical progress: while frameworks for risk, trust, and governance are maturing, the coupling of technical advances with deployment standards (AI-45° Law, game-theoretic equilibria, new social contracts) is still emergent and frequently “choked” by coordination problems and incentives (Young, 25 Jan 2025, Morris et al., 2023).

7. Outlook and Future Research

Levels-of-AGI frameworks provide a critical structure for guiding incremental progress, risk assessment, and policy design. The field anticipates continued refinement along several axes:

  • Greater taxonomic precision and formalization in both capability and trust frameworks
  • Expanding multi-modal and embodied benchmarks capturing ecological and physical-task generalization
  • Enhanced linkage between technical evaluation and safety/governance protocols (e.g., constructing verifiable, always-up-to-date safety cases per capability level)
  • Policy innovation based on coordination insights, network effects, and incentive-compatible infrastructure development

Levels-of-AGI frameworks will persist as indispensable tools for aligning scientific progress, safety engineering, and societal expectations as the field advances toward general intelligence.