SuperARC: An Agnostic Test for Narrow, General, and Super Intelligence Based On the Principles of Recursive Compression and Algorithmic Probability (2503.16743v3)

Published 20 Mar 2025 in cs.AI, cs.IT, and math.IT

Abstract: We introduce an open-ended test grounded in algorithmic probability that can avoid benchmark contamination in the quantitative evaluation of frontier models in the context of their AGI and Superintelligence (ASI) claims. Unlike other tests, this test does not rely on statistical compression methods (such as GZIP or LZW), which are more closely related to Shannon entropy than to Kolmogorov complexity and are not able to test beyond simple pattern matching. The test challenges aspects of AI, in particular LLMs, related to features of intelligence of fundamental nature such as synthesis and model creation in the context of inverse problems (generating new knowledge from observation). We argue that metrics based on model abstraction and abduction (optimal Bayesian inference') for predictiveplanning' can provide a robust framework for testing intelligence, including natural intelligence (human and animal), narrow AI, AGI, and ASI. We found that LLM model versions tend to be fragile and incremental as a result of memorisation only with progress likely driven by the size of training data. The results were compared with a hybrid neurosymbolic approach that theoretically guarantees universal intelligence based on the principles of algorithmic probability and Kolmogorov complexity. The method outperforms LLMs in a proof-of-concept on short binary sequences. We prove that compression is equivalent and directly proportional to a system's predictive power and vice versa. That is, if a system can better predict it can better compress, and if it can better compress, then it can better predict. Our findings strengthen the suspicion regarding the fundamental limitations of LLMs, exposing them as systems optimised for the perception of mastery over human language.

Summary

SuperARC: Assessing AI Intelligence with Algorithmic Information Theory

The paper “SuperARC: A Test for General and Super Intelligence Based on First Principles of Recursion Theory and Algorithmic Probability” explores a novel testing framework aimed at evaluating claims of AI systems regarding their capabilities of AGI and Superintelligence (ASI). This framework, termed SuperARC, is grounded in principles from recursion theory, algorithmic probability, and algorithmic information theory (AIT), and seeks to assess AI models, including LLMs, through a rigorous and formal intelligence benchmark.

The authors’ pivotal argument lies in the inadequacies of current intelligence tests for AI. Most tests are reliant on superficial metrics related to language mastery, often overlooking deeper cognitive capabilities such as pattern recognition, model abstraction, synthesis, and predictive planning inherent to natural intelligence. SuperARC aims to fill this gap by proposing a test that centers on fundamental intelligent behaviors emerging from principles of algorithmic probability and recursive model creation.

Theoretical Background and Methodology

Central to the evaluation framework is the use of algorithmic complexity, particularly Kolmogorov complexity, to assess the minimal description length of sequences generated—or to be generated—by AI models. The paper posits that the identifier of intelligence ought to be the model's ability to compress information effectively, translating to the creation of shorter programs capable of explaining observed data through recursive patterns.

A system that can exhibit high intelligence should be able to maximize compression (abstraction) while maintaining precise predictions (planning) despite increasing complexity. This is achieved through a novel application of the Block Decomposition Method (BDM) and the Coding Theorem Method (CTM), which leverage algorithmic probability to achieve robust model compression. Intriguingly, the paper compares the effectiveness of LLMs against these methodologies. CTM/BDM, representing a neurosymbolic approach, is proposed to provide not only a performance benchmark but an optimal pathway to model convergence based on fundamental computing principles.

Challenges for Current AI Systems

Current LLMs were put through a sequence prediction test, wherein they were prompted to generate models or algorithms to generate entirely prescribed sequences. The results reveal fragmentation in the progress of LLMs: while some models exhibit proficiency in generating repetitive or simplistic sequence constructs, their performance dramatically declines with increased sequence complexity that requires deeper abstraction and fewer pattern cues.

One notable conclusion from the paper’s experiments is a higher failure rate among recent LLM versions when tasked with comprehension beyond rote memorization of sequences—reflecting an inability to derive or even maintain a coherent internal model conducive to creative generation or efficient solution synthesis. This underscores a limitation fundamental to pattern-based statistical learning when divorced from stronger algorithmic complexity principles.

Implications and Speculations for AI Research

The implications of this research are twofold. Practically, it delivers an open-ended and dynamic benchmarking tool that aims to mitigate overfitting and benchmark contamination that plague conventional LLM testing. Theoretically, it opens discourse on the necessity of AI evolution towards methodologies inherently capable of mechanistic, causal learning akin to symbolic computation.

By focusing on algorithmic comprehension and prediction, SuperARC introduces a benchmark that not only aligns with a formal framework of intelligence as understood in AIT but also envisions a progress trajectory towards symbolically induced AGI and ASI. Future developments anticipated include hybrid systems employing CTM and BDM to bridge the gap statistically, enhancing probabilistic AI approaches with symbolic processing for an overview capable of meaningful, generative intelligence.

Ultimately, SuperARC appears as an important reframing of AI evaluation standards, challenging practitioners to reimagine intelligence tests that genuinely reflect a broader spectrum of cognitive features recognized as central to both natural and future artificial forms of intelligence.