- The paper proposes a novel multidimensional evaluation protocol that uses ten cognitive faculties to benchmark AI against human performance.
- It employs systematic cognitive tasks, human baseline comparisons, and integration strategies to ensure robust scoring and uncertainty management.
- The study highlights the need for evolving taxonomies and high-quality, private evaluation suites to effectively address current benchmarking gaps.
Measuring Progress Toward AGI: A Cognitive Framework
Motivation and Background
The paper "Measuring Progress Toward AGI: A Cognitive Framework" (2605.28405) addresses the persistent ambiguity in defining and measuring progress toward artificial general intelligence. Rather than relying on ad hoc or task-specific benchmarks, the authors advocate for an empirically grounded, multidimensional evaluation protocol rooted in established cognitive science, psychology, and neuroscience principles. The impetus is to provide clarity for researchers and policymakers, enabling nuanced assessment and robust governance of ever-advancing AI systems by mapping their capabilities to human cognition.
The Cognitive Taxonomy: Structure and Rationale
The authors introduce a cognitive taxonomy comprising ten faculties, each demarcating a core component of general intelligence as observed in humans:
- Perception: Extraction and processing of sensory information across modalities.
- Generation: Production of outputs, from language to motor actions.
- Attention: Allocation of cognitive resources for relevant stimuli or tasks.
- Learning: Acquisition of novel knowledge, skills, or behaviors.
- Memory: Persistent storage and retrieval of information.
- Reasoning: Inference and logical processing culminating in valid conclusions.
- Metacognition: Self-monitoring and regulation of cognitive processes.
- Executive Functions: Goal-directed organization, planning, inhibition, flexibility.
- Problem Solving (Composite): Efficient application and integration of faculties in diverse domain-specific contexts.
10. Social Cognition (Composite): Processing and interpretation of social cues, norms, and interactions.
Each faculty is rigorously specified with subordinate abilities and modal variations (e.g., low/high-level perception, domain knowledge, various reasoning styles), ensuring comprehensive coverage and fine-grained diagnostic utility. Importantly, the taxonomy is implementation-agnostic, focusing on observable behavior and task competence rather than internal mechanisms or architectural traditions.
Figure 1: Overview of the 10 cognitive faculties. Faculties outlined in orange represent composite faculties.
Evaluation Protocol: Methodology for Cognitive Benchmarking
To operationalize AGI assessment, the paper prescribes a three-stage evaluation protocol:
Analysis and Discussion
Benchmarking Gaps and Future Requirements
While partial benchmark coverage exists (notably in perception, problem solving, and world knowledge), significant lacunae remain for metacognition, attention, learning, and social cognition. Many existing datasets are public, which increasingly undermines their relevance due to contamination. The framework’s practical realization thus hinges on continued creation of high-quality, private, and independently audited evaluation suites.
Beyond Faculties: Supplementary Metrics
The authors elaborate that cognitive benchmarking is necessary but not sufficient for full AGI characterization. Complementary considerations include:
- Processing and Response Speed: Timeliness as a critical axis for real-world utility, decoupled from correctness.
- System Propensities: Behavioral tendencies (risk, alignment, strategy, communication) impacting safety and reliability.
- Creativity: Although difficult to isolate, aspects like cognitive flexibility, world knowledge, and problem solving can be proxy-evaluated.
- Deployment Evaluations: End-to-end empirical studies remain essential for domain-specific utility and impact forecasting.
Model vs. System Evaluation
The framework advises evaluating entire AI systems—including tools, modules, and environmental interfaces—rather than just core models, aligning with real-world deployment realities. Modularity is recognized as intrinsic to both biological and artificial intelligence. However, this raises methodological challenges regarding tool-access parity and interpretability for cognitive task construction.
Taxonomy Iteration and Emergent Capabilities
The taxonomy is not prescriptive or exhaustive; anticipated emergence of novel AI faculties necessitates iterative refinement. Practical relevance of each faculty for real-world tasks is not fully established, motivating future empirical validation work. The framework is posited as a foundation for a rigorous, evolving science of AGI.
Numerical Results and Claims
The paper’s methodology enables empirical differentiation of systems that:
- Score below the human median in select faculties (signaling real-world limitations).
- Exceed the human median across all faculties (potentially matching at least half of sampled humans).
- Approach/exceed the human maximum for all faculties (indicative but not definitive for “superhuman” cognitive generality).
These faculty-by-faculty profiles afford nuanced, multidimensional system characterization rather than binary or monolithic “AGI/not-AGI” status.
Implications and Future Directions
Practically, the proposed framework sets the stage for transparent progress tracking and comparative evaluation of AGI candidates. It facilitates objective communication between technical stakeholders and policymakers, which is essential for responsible governance and deployment. Theoretically, mapping the jagged profile of AI cognition to the human spectrum stimulates both foundational research and ongoing taxonomy expansion.
Future developments will likely involve:
- Closing benchmark gaps, particularly for social cognition and metacognition.
- Enhanced statistical modeling for performance integration and uncertainty quantification.
- Incorporation of emergent, non-human faculties and hybrid intelligence paradigms.
- Refined benchmarks for creativity, behavioral propensities, and deployment-specific workflows.
Conclusion
"Measuring Progress Toward AGI: A Cognitive Framework" establishes a rigorous, empirically justified road map for AGI evaluation, grounded in human cognitive science. By decomposing intelligence into ten multidimensional faculties and prescribing a robust evaluation protocol, the framework offers actionable tools for benchmarking AI progress, contextualizing claims, and guiding responsible advancement. The approach is inherently extensible, baseline-driven, and adaptation-ready, providing a critical scaffold for scientific inquiry and technological oversight in advanced AI.