Papers
Topics
Authors
Recent
Search
2000 character limit reached

From AGI to ASI

Published 10 Jun 2026 in cs.AI, cs.CY, and cs.LG | (2606.12683v1)

Abstract: Over the last decade, building human-level artificial general intelligence has moved from far-fetched speculation to being a concrete next-decade target for many of the largest AI organisations. Achieving this goal would have profound and far-reaching impacts on human society, which raises many complex questions for the decade ahead. This report investigates how AI itself might continue to develop in a post-AGI world along the continuum of machine intelligence. The endpoint of this continuum, Universal AI, is theoretically well understood, which provides some formal grounding for the main focus of this report: the transition from human-level AGI to artificial general superintelligence, which, intuitively, can be understood as a system that is more intelligent and cognitively capable than large organisations of humans. After characterizing ASI, the report discusses four potential pathways from AGI to ASI: scaling AGI, AI paradigm shifts, recursive improvement, and ASI emerging from large-scale multi-agent collectives. The report then discusses possible frictions and bottlenecks along these pathways. Determining whether the impact of these frictions will be negligible or substantial raises a number of concrete open research questions. Due to large uncertainties for predicting ASI progress, it cannot be ruled out that AI progress might continue to accelerate over the next years. This could imply that the image of a single transformative step change, caused by the introduction of human-level AGI into our society, could be inaccurate. More apt might be the prospect of a series of transformative societal changes caused by AI-enabled progress and breakthroughs across many areas of science and technology. Preparing for this prospect requires a massively interdisciplinary endeavour of global scope and interest.

Summary

  • The paper establishes formal definitions for AGI and ASI, framing ASI as a qualitative leap beyond human collective intelligence.
  • It analyzes four key pathways—compute scaling, algorithmic shifts, recursive self-improvement, and collective intelligence—while highlighting bottlenecks like data limits and economic constraints.
  • By integrating the Legg-Hutter/AIXI framework with empirical observations, the report presents a comprehensive research agenda for forecasting and safely transitioning to ASI.

From Artificial General Intelligence to Artificial Superintelligence: Technological Pathways, Frictions, and Open Questions

Formalizing AGI, ASI, and the Continuum of Machine Intelligence

The report provides an in-depth analysis of potential trajectories from AGI to Artificial Superintelligence (ASI), establishing clearly demarcated working definitions for both. AGI is outlined as "competent AGI"—median human-level performance across a broad spectrum of cognitive tasks, whereas ASI is characterized as surpassing even large, coordinated groups of human experts across virtually all domains. The Legg-Hutter formalism is invoked as a theoretical substrate, anchoring intelligence as expected performance averaged over all computable environments, with the AIXI agent as the formal upper bound [Legg2007Universal]. Critically, ASI here is defined as a qualitative leap over even the most capable conceivable human organization, not merely superhuman performance on isolated benchmarks.

The transition from AGI to ASI is argued to constitute a continuum, not a step function—practical and theoretical progress must be tracked along this axis, with agent collectives and coordination mechanisms as essential considerations.

Theoretical Limits and Structural Properties of Superintelligence

A detailed discussion establishes that ASI is strictly bounded by physical, computational, and logical constraints—including the speed of light, energy dissipation (Landauer's principle), complexity-theoretic lower bounds, and Gödelian incompleteness [Hutter:24uaibook2, kim2026modelfreeuniversalai]. The continuum nature of intelligence as captured by the Legg-Hutter score ensures that sharp phase transitions—"intelligence explosions"—are not guaranteed but are contingent on empirical scaling properties of practical systems and on the realization of recursive self-improvement (RSI) dynamics.

The formal framework of Universal AI (AIXI) is reviewed in detail, emphasizing the centrality of the universal prior (Kolmogorov complexity weighting), Bayesian model averaging over computable environments, and the formal optimality results that make AIXI the theoretical endpoint in both learning and acting under uncertainty. Recent progress towards computationally practical alternatives (Monte Carlo approximations, as in [veness2011MC]; amortized predictors [Grau2024Learning, genewein2026algorithmic]; embedded agents [meulemans2025embeddeduniversalpredictiveintelligence]) is noted, but the computational gap between these and practical ASI remains vast.

Core Pathways from AGI to ASI

The report systematically investigates four principal pathways:

1. Compute, Model, and Data Scaling

Extrapolation of empirical scaling laws is central to this pathway. The authors stress the sustained 10× annual growth in effective compute, a composite of hardware advances, investment scaling, and algorithmic improvements, but highlight frictions such as impending saturation of high-quality data (the "data wall") [Villalobos2024WillRunOut], economic and resource limits, and hardware interconnect bottlenecks. The uncertainty whether additional scaling will produce "emergent" (vs. smooth) capabilities is critically assessed, with emergent effects in large models subject to ongoing debate [Wei2022Emergent, Schaeffer2023Mirage].

2. Algorithmic Paradigm Shifts

The historical tendency for breakthroughs to accelerate progress after periods of stagnation is invoked. The report suggests that future shifts—whether via neuromorphic computing, novel optimization regimes, or architectures enabling open-ended world modeling—could decisively move beyond limitations of current pretraining + log-loss minimization paradigms. Enhancements in test-time scaling, retrieval-augmented systems, and explicit world model integration (e.g., Dreamer, MuZero) are positioned as near-term evolutionary steps, while more radical shifts (hardware, algorithmic, or cognitive) may be required to escape future capability plateaus.

3. Recursive Self-Improvement (RSI) and Automated R&D

Fully or partially automated AI R&D is analyzed as a potential trigger for hyperbolic growth (intelligence explosion) in machine intelligence [davidson2026does, macaskill2025preparingintelligenceexplosion]. This dynamic is decomposed into hardware improvements, code/architecture optimization, automated data curation/generation, and specialization within agent collectives. Bottlenecks such as the embodied bottleneck (dependence on real-world testing and data) and diminishing returns in recursive data generation are critically evaluated, with no empirical evidence to date supporting open-ended RSI in the wild.

4. Collective Intelligence via Group Agency & Multi-Agent Systems

A pathway unique to digital agents is highlighted: the possibility that "ASI" will arise not from individual geniuses but from super-efficient, large-scale, highly coordinated agent collectives exhibiting group agency [list2011group, tomasev2026intelligent]. Analytical focus is given to scaling laws for multi-agent systems and the possibility of both market-driven and centrally orchestrated suprapersonal intelligence, with parallels to organizational and economic dynamics in human societies.

Key Bottlenecks and Frictions

The transition from AGI to ASI, across all pathways, is argued to be potentially impeded by a set of critical bottlenecks:

  • Data Wall: Imminent exhaustion of high-quality human-generated data; open questions about the viability and limitations of synthetic and interaction-generated data.
  • Resource & Economic Constraints: Exponential growth in compute and energy needs; possibility of unsustainable economic inputs for further scaling [Sevilla2022ComputeTrends, agrawal2025economics].
  • Paradigm Sufficiency: Inherent limitations of the neural, gradient-based paradigm; unresolved open problems in out-of-context continual learning, robust grounding, and abstraction formation [lerchner2026abstraction].
  • Research Increasingly Harder: Empirical observations that progress in science and engineering (including AI research itself) often requires exponentially more input resources over time [bloom2020ideas].
  • Abstraction Barrier: The conjecture that systems trained primarily on human knowledge may lack mechanisms for de novo conceptual innovation from raw data; thus, AGI may be bounded at human-level abstraction until new mechanisms are integrated.
  • Governance, Regulation, and Societal Backlash: Societal and political pressures could cap development or deployment, either through accidents, misuses, and subsequent regulation, or through international competition and coordination failures [anderljung2023frontier, bengio2024isr].

Open Research Directions

The final section presents a comprehensive research agenda, grouping open questions according to their relevance to forecasting, benchmarking, scalability, paradigm shifts, recursive self-improvement, group agency, alignment, and governance. Key areas include:

  • Formalizing and empirically estimating the impact of the data wall, bottlenecks in synthetic and interactive data regimes, and economic limitations.
  • Developing methodologies for "ASI benchmarking" that avoid saturation at human levels and are robust to benchmark gaming.
  • Characterizing empirical and theoretical multi-agent scaling laws.
  • Investigating the practical boundaries of recursive improvement and the onset of diminishing or degenerative returns in automated R&D.
  • Advancing fundamental theory (AIXI, embedded agency, complexity-theoretic limits) to analyze the practical tractability gap between computable upper bounds and current empirical systems.
  • Clarifying the effects of alignment and safety requirements as direct capability constraints in scenarios where human supervision becomes infeasible.

Empirical Implications and Theoretical Impact

The implications of the report for AI development are multi-faceted. Practically, it underscores that digital intelligence possesses profound scaling advantages over biological intelligence—speed, bandwidth, substrate independence, and lossless memory sharing—none of which respect the familiar biological realities that shape human cognition and organizations. As compute, hardware, and software paradigms evolve, the organizational forms of advanced AI—whether monolithic, modular, market-driven, or collective—may become both the central technical and governance challenge.

Theoretically, the existence of a well-defined continuum (Legg-Hutter/AIXI) constrains optimism regarding abrupt discontinuities while leaving open the possibility of rapid, multi-pathway progress toward and beyond AGI. The interplay of empirical scaling, recursive improvement, bottleneck alleviation, and the emergence of qualitatively new forms of abstraction and reasoning will likely determine whether ASI is realized as a fast transition or as a succession of manageable, interpretable transformations.

Conclusion

The report "From AGI to ASI" (2606.12683) represents a rigorous, multi-factorial landscape analysis of the post-AGI trajectory in artificial intelligence. Its structured decomposition of pathways, frictions, and open problems provides a formal foundation for future research planning, empirical tracking, and risk/opportunity assessment. The emphasis on benchmarking, empirical forecasting, and deep theoretical integration illustrates the scale and complexity of the research ecosystem required for responsible stewardship of ASI-level capabilities. The central takeaways are that neither technical inevitability nor omnipotence are guaranteed post-AGI, and that the rate and route of ASI development remain contingent on both technological innovations and societal choices.

The challenge for the field is, thus, to shift from speculative forecasting to continuous quantitative measurement, empirical validation of scaling and bottleneck models, and coordinated progress in algorithmic, systemic, and governance domains. Whether ASI emerges as an extension of current trends, through sudden paradigm shifts, by recursive intelligence improvement, or as a property of large-scale collectives, will depend crucially on research that integrates theoretical depth with empirical rigor.

Whiteboard

Explain it Like I'm 14

Overview

This paper asks a big question: once we build AI that’s about as smart as an average person (AGI), how might it keep getting better and become superintelligent (ASI)? The authors map out possible routes from AGI to ASI, explain advantages and limits of digital intelligence, and highlight obstacles and open questions that could speed up or slow down progress.

Key Questions

  • What do we mean by AGI and ASI in plain terms?
  • What paths could take us from human-level AI to much smarter AI?
  • What advantages do digital (computer-based) brains have over human brains?
  • What real-world limits and bottlenecks could slow or stop progress?
  • How fast might AI improve, and how should society prepare?

How did the authors study it?

Instead of running lab experiments, this is a “conceptual roadmap” that pulls together:

  • Clear definitions and theory: They use ideas like the Legg-Hutter score (a formal way to measure general intelligence across many tasks) and AIXI/Universal AI (a theoretical “maximum intelligence” agent) to anchor the discussion.
  • Trend data and scaling laws: They look at how fast computing power and algorithmic efficiency have been growing, and how “more compute” tends to lead to better AI performance.
  • Scenario analysis: They lay out four pathways from AGI to ASI and consider potential bottlenecks for each.
  • Open questions: They list research areas needed to reduce uncertainty and help society plan ahead.

Technical terms in everyday language:

  • “Compute” is the total power of the computers used to train and run AI.
  • “Algorithmic efficiency” means getting the same results with less compute—like learning to study smarter, not just longer.
  • “Effective compute” combines three things that boost AI’s capabilities: better hardware, more investment in hardware, and smarter algorithms. Think of it like: faster engines + more engines + better driving = much faster progress.
  • “Scaling laws” are patterns showing that bigger models trained with more data and compute tend to perform better, up to a point.
  • “Recursive improvement” means AI systems helping to build even better AI systems—a loop that could speed up progress.
  • “Singularity” is a math idea that describes growth so fast it seems to blow up in a short time. In practice, real-world limits usually prevent infinite growth.

What did they find?

Simple meanings of AGI and ASI

  • AGI: AI that’s around the “median human” on many thinking tasks. It won’t be perfect, but it could already beat humans in some areas and be weaker in others.
  • ASI: AI that, overall, performs better than large, well-coordinated teams of top human experts on almost all tasks. It’s not magic—it still faces physical and practical limits—but it’s broadly superhuman.

Advantages of digital intelligence

Digital AIs have some built-in strengths that grow as computers get faster:

  • They can read and write information extremely quickly (for example, digesting multiple books in seconds).
  • They can “think” faster by using more compute or running many processes in parallel.
  • They can have huge working memory compared to humans.
  • They can be copied perfectly, paused, backed up, and restarted (including their learned experience).
  • They can share knowledge at very high speed across many copies.

These advantages mean AI collectives could learn and coordinate much faster than human groups.

Four pathways from AGI to ASI

The authors outline four non-exclusive routes (these could happen simultaneously):

  • Scaling AGI: Keep adding compute, data, and smarter training, and use techniques like “test-time scaling” (letting models think longer or search more before answering) to boost abilities.
  • Paradigm shifts: Invent new kinds of AI beyond today’s dominant approaches (for example, a new architecture or learning method that leaps ahead).
  • Recursive improvement: Use AI to automate parts of AI research and engineering, speeding up the creation of better systems in a self-reinforcing loop.
  • Multi-agent collectives: Organize large numbers of AI agents that coordinate, specialize, and share knowledge to behave like a “super-agent.”

Possible frictions and bottlenecks

Before growth can get too fast, reality pushes back. The paper discusses obstacles such as:

  • The data wall: Running out of enough high-quality, novel training data.
  • Abstraction barrier: AI trained mostly on human concepts might struggle to invent truly new concepts directly from raw reality.
  • Compute and energy costs: Even with faster chips, scaling up massively requires huge investments and power.
  • Diminishing returns: Each extra unit of compute may help less than the previous one, making progress slower.
  • Real-world physics and time: You can’t beat the speed of light; complex experiments (biology, weather, economies) take real time.
  • Measurement limits: We never have perfect information; observations have noise and finite precision.
  • Practical constraints: Safety, regulation, deployment, and integration with the physical world can slow advances.

How fast could this go?

  • The past decade saw strong growth in hardware improvements (about 1.5× per year), rising investment (about 2.5× per year), and algorithmic efficiency (roughly 3× per year or more in some areas). Multiplying these gives “effective compute” growing around 10× per year in rough estimates.
  • If such growth continues, AI capability could accelerate, possibly in waves rather than one giant “AGI moment.”
  • But growth may slow or follow an S-shaped curve due to bottlenecks. Predicting exact timelines is hard and uncertain.

Why it matters and what’s next

If AI keeps getting better beyond human level, it could transform many parts of life: science, medicine, education, work, economics, politics, and culture. These changes might arrive as a series of breakthroughs rather than one single event. Because timelines are uncertain, the authors recommend:

  • Measuring and forecasting AI progress carefully (track compute, efficiency, data, and signs of recursive improvement).
  • Studying the bottlenecks and how they affect each pathway to ASI.
  • Preparing across many fields—technical, ethical, economic, and legal—so society can adapt safely and fairly.

In short, the paper doesn’t claim ASI is guaranteed or imminent, but it shows clear routes by which it could happen. It urges serious, interdisciplinary planning now, so we’re ready whether progress speeds up, slows down, or surprises us in ways we don’t yet expect.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a single, concrete list of unresolved gaps that the paper identifies or implies, framed to guide actionable future research.

  • Quantifying the AGI→ASI threshold: No operational, measurable criterion exists to determine when a system “exceeds large, well‑coordinated human‑expert collectives” across domains; needs task suites, aggregation rules, and acceptable error/robustness thresholds.
  • Approximating Legg–Hutter intelligence in practice: Lacks a practical estimator or benchmark framework that complexity-weights heterogeneous, computable tasks to produce a usable “intelligence score” for contemporary models.
  • Benchmark design beyond i.i.d.: The paper recognizes the limits of static, i.i.d. benchmarks but does not offer standardized continual‑learning, non‑stationary RL, and cooperative multi‑agent evaluations aligned with its intelligence conception.
  • Validating benchmark stitching: No protocol is provided to calibrate, validate, and reproduce “benchmark stitching” across labs, models, and domains, including uncertainty quantification and bias correction.
  • Mapping effective compute to capability unlocks: Absent are formal models linking hardware investment, algorithmic efficiency, and inference/test‑time scaling to the probability and timing of new capability emergence.
  • Detecting phase transitions vs diminishing returns: No empirical methodology is offered to identify emergent capability discontinuities (or plateaus) and to distinguish them from smooth scaling effects.
  • Inference scaling vs training scaling trade‑offs: The conditions under which test‑time compute (e.g., extended reasoning/search) substitutes for or complements training compute remain unquantified across tasks and risk profiles.
  • System‑level capability from population size: The paper proposes many‑instance scaling (millions of copies) but lacks models and metrics for how coordination, memory sharing, and orchestration translate into super‑additive collective performance.
  • Measuring “group agency”: No concrete architectures, protocols, or evaluation metrics are specified to test when and how multi‑agent systems exhibit emergent agency exceeding the sum of parts.
  • Recursive improvement dynamics: There is no formal dynamical systems model (with parameters and observables) of AI‑for‑AI R&D loops, tipping points to super‑exponential growth, and saturation/friction regimes.
  • Indicators of AI‑automated research: The paper calls for tracking automation but does not define measurable proxies (e.g., % of pipeline stages automated, algorithmic‑efficiency contribution attribution, cycle‑time compression).
  • Sustaining algorithmic efficiency trends: The reported rates (e.g., ~3–6×/year) are uncertain; no methodology is provided to decompose, forecast, and bound future gains across domains and training regimes.
  • Compute growth sustainability and bounds: No bottom‑up resource/energy/supply‑chain model quantifies feasible effective compute trajectories (power, cooling, land, capital, manufacturing lead times) and their likely inflection points.
  • Data wall resolution pathways: While self‑generated interaction data is suggested, the paper lacks quality metrics, de‑biased curation pipelines, and safeguards against model‑collapse or error‑reinforcement in self‑training loops.
  • Abstraction barrier testing: No empirical benchmarks or interventions are proposed to assess whether models can discover non‑human abstractions from raw data and how to induce such capabilities if absent.
  • Embodiment and I/O bandwidth effects: The “embodiment factor” hypothesis is noted, but no controlled experiments vary input/output bandwidth to causally link communication constraints to abstraction learning and planning quality.
  • Simulation fidelity and real‑world transfer: The report does not quantify how simulation accuracy, coverage, and causal adequacy affect generalization and data efficiency for agents deployed in messy, partially observed real environments.
  • Partial observability and epistemic limits: Lacks concrete methods to quantify and manage ignorance (e.g., active sensing, information‑theoretic budgets, experiment design) in domains bound by real‑time and measurement constraints.
  • Translating physical/complexity limits into engineering bounds: Theoretical limits (e.g., Landauer, speed of light) are not mapped to near‑ and medium‑term ceilings for training, inference, and distributed coordination.
  • Safety/alignment under scaling and recursion: The pathways focus on capability growth but do not specify how alignment assurances scale under test‑time scaling, multi‑agent collectives, or recursively improving AI R&D loops.
  • Cooperative intelligence evaluation: Despite noting that “all computable cooperative tasks” matter, there is no standardized battery for measuring cooperation, norm adherence, or coalition‑formation under incentives and adversarial pressures.
  • Knowledge‑seeking (KS) objectives in practice: No concrete experimental agenda compares KS‑driven agents to reward‑driven ones on scientific discovery, exploration under sparse rewards, or safe information acquisition.
  • Bridging transformers and universal prediction: The paper cites Solomonoff induction/AIXI but offers no constructive approximations that connect modern architectures (e.g., transformers with search) to universal priors in tractable ways.
  • Coordination capacity metrics for human baselines: The benchmark “large, well‑coordinated human‑expert collectives” lacks quantified baselines (team size, time budget, tools) to serve as a reproducible comparison point.
  • Economic feedbacks on R&D and compute: No quantitative macro/micro models connect AI‑driven productivity, capital flows, and labor dynamics to effective compute growth and research throughput.
  • Governance of compute and progress measurement: The call for ongoing measurement lacks proposals for standardized reporting, auditing, and open datasets to track training/inference compute, algorithmic efficiency, and model capabilities.
  • Security and proliferation risks from digital advantages: Substrate independence and lossless replication raise dual‑use concerns not analyzed here (e.g., safeguards against copy exfiltration, misuse, or rapid unauthorized scaling).
  • Interpretability for non‑human abstractions: There is no methodology to detect, interpret, and verify alien representations emerging at scale, nor to ensure controllability when human concepts are not primary.
  • Robustness and reliability under real‑time constraints: The paper omits protocols for long‑horizon, real‑time, safety‑critical evaluation with distribution shifts, delayed feedback, and cascading error modes.
  • Human‑AI institutional co‑evolution: The interaction between regulatory constraints, auditing regimes, and the four pathways (scaling, paradigm shifts, recursion, collectives) is not modeled for pace and direction of progress.
  • Open‑source vs proprietary dynamics: The impact of openness on recursive improvement, data availability, and safety externalities is unaddressed; no empirical strategy to study ecosystem‑level effects.
  • Falsifiable forecasts and thresholds: Many hypotheses (e.g., ~10×/year effective compute growth) are not accompanied by falsifiable timelines, tolerance bands, or trigger conditions for course correction.
  • Cross‑domain generality stress tests: Beyond narrow SOTA benchmarks, the paper does not define evaluations that require transfer across diverse modalities, tasks, and time horizons consistent with its general intelligence framing.
  • Tooling for multi‑agent orchestration: Concrete mechanisms (protocols, APIs, incentive schemes, communication languages) to reliably compose large agent collectives and measure emergent properties are not specified.
  • Ethical/normative allocation during rapid change: There is no framework for distributing benefits/harms, managing displacement, or ensuring global equity if capability growth accelerates in bursts rather than a single step.
  • Catastrophic interaction risks among frictions: The paper lists frictions conceptually (per abstract) but does not model how they interact (e.g., energy limits + data quality + governance delays) to produce regime shifts or chokepoints.
  • Evaluation of societal-scale outcomes: The prospect of “series of transformative changes” lacks operational indicators, early‑warning metrics, and experimental/futures methods to test scenarios and guide policy.

Practical Applications

Overview

Below are practical, real-world applications that flow from the paper’s findings, methods, and conceptual innovations (e.g., effective compute accounting, benchmark stitching, test-time scaling, recursive improvement, group agency, data wall, abstraction barrier, knowledge-seeking objectives, and UAI/AIXI as a north star). They are grouped into Immediate Applications (deployable now or with existing tools) and Long-Term Applications (requiring further research, scaling, or policy development). Each item links to sectors and notes potential tools/workflows and key assumptions or dependencies.

Immediate Applications

The following applications can be piloted or deployed with current techniques and infrastructure, drawing on the paper’s concepts to improve reliability, comparability, and impact.

  • Effective compute accounting and forecasting (industry, finance, policy, academia; software/energy)
    • Use case: Track and forecast “effective compute” by combining hardware improvements, investment growth, and algorithmic efficiency gains to plan training runs, serve capacity, and cost trajectories.
    • Tools/workflows: Internal dashboards ingesting fleet specs, training logs, algorithmic efficiency benchmarks; finance-facing models for capex/opex planning; energy-consumption projections for data centers.
    • Assumptions/dependencies: Access to accurate internal metrics; shared definitions of algorithmic efficiency; stable measurement protocols; cooperation across hardware/software teams.
  • Benchmark stitching for model selection and capability extrapolation (industry, academia; software, healthcare, education)
    • Use case: Make unified, cross-benchmark comparisons to select models for deployment in regulated or safety-critical contexts (e.g., clinical evidence synthesis, tutoring, enterprise copilots).
    • Tools/workflows: Evaluation platforms that stitch heterogeneous benchmarks; uncertainty estimates for out-of-distribution performance; procurement processes that require stitched reports.
    • Assumptions/dependencies: High-quality, up-to-date benchmark sets; governance for test contamination and data leakage; agreement on extrapolation limits.
  • Test-time scaling operations (software, robotics-lite, healthcare, education; industry)
    • Use case: Increase reasoning reliability by allocating additional compute at inference (e.g., chain-of-thought, self-consistency sampling, explicit search) for tasks like coding, legal review, clinical summarization, tutoring, and planning.
    • Tools/workflows: Dynamic per-query “compute budgets”; routing policies that escalate to deeper search only when needed; logging for auditability; cost–latency trade-off managers.
    • Assumptions/dependencies: Serving stacks that support variable compute; well-calibrated triggers for when to scale reasoning; monitoring for latency/cost blow-ups.
  • Multi-instance orchestration for enterprise productivity (industry, finance, software, education; daily life)
    • Use case: Exploit “lossless replication” and parallelism advantages of digital intelligence to run fleets of identical agents for research, summarization, due diligence, or customer support.
    • Tools/workflows: Orchestration layers (e.g., agent schedulers, memory pools, shared knowledge bases); de-duplication and aggregation pipelines; human-in-the-loop review gates.
    • Assumptions/dependencies: Reliable state sharing and knowledge-base hygiene; guardrails for hallucination and privacy; clear cost accounting.
  • AI-for-AI R&D acceleration (industry, academia; software)
    • Use case: Partial automation of model development and evaluation (hyperparameter search, data cleaning/selection, fine-tuning recipes, inference-time strategy selection), aligning with “recursive improvement” in a controlled fashion.
    • Tools/workflows: Experiment planners; automated evaluation and ablation runners; synthetic error-analysis reports; closed-loop A/B testing.
    • Assumptions/dependencies: Robust offline/online metrics; strong experiment tracking; safeguards against “overfitting the metrics” and data contamination.
  • Synthetic data pipelines to mitigate the data wall (industry, academia; software, healthcare, education)
    • Use case: Use high-quality synthetic or interaction data to augment scarce domains (domain-specific QA, structured reasoning, tutoring dialogues) while auditing for quality/novelty.
    • Tools/workflows: Data generators with quality filters; deduplication and “distance-from-training” metrics; adversarial data validation; domain expert review.
    • Assumptions/dependencies: Reliable measures of data utility/diversity; policies to avoid training on private or sensitive data; clear evaluation of transfer gains vs. overfitting risks.
  • Progress tracking and forecasting for policy and governance (policy, academia, industry; energy, finance)
    • Use case: Create public-facing dashboards that track compute, algorithmic efficiency, benchmark saturation, and safety indicators to inform policy, grants, and infrastructure planning.
    • Tools/workflows: Standardized reporting schemas; third-party audits; time-series models for capability growth; ensembles of forecasting methods.
    • Assumptions/dependencies: Reporting compliance by labs; agreed taxonomies of risks and capability classes; funding for independent measurement bodies.
  • Energy and infrastructure planning for AI growth (industry, energy, policy; finance)
    • Use case: Plan for “gigawatt AI” trajectories by linking effective compute forecasts to energy build-outs, grid integration, and heat reuse projects.
    • Tools/workflows: Capacity planning models; power purchase agreement (PPA) templates tailored to AI loads; site selection models considering water/heat constraints; lifecycle carbon accounting.
    • Assumptions/dependencies: Grid capacity and permitting; predictable silicon supply chains; technological roadmaps for accelerator efficiency; regulatory clarity on siting and emissions.
  • Education workflows that leverage AI summarization and “reasoning-on-demand” (education; daily life)
    • Use case: Adopt AI-mediated summaries tailored to learner backgrounds; scaffolded problem-solving with adjustable “test-time scaling” to show steps on demand.
    • Tools/workflows: Instructor dashboards that calibrate reasoning depth; provenance tracking of content; assessment tools that measure human understanding (not just answers).
    • Assumptions/dependencies: Guardrails against plagiarism; equity of access; teacher training; valid assessments for learning gains.
  • Risk and friction registries for AI programs (policy, industry; cross-sector)
    • Use case: Maintain live registries of potential “frictions” (e.g., data wall, abstraction barrier, energy/compute constraints, safety hazards) mapped to programs and use cases.
    • Tools/workflows: Risk taxonomies; scenario planning; red-teaming playbooks; escalation policies; periodic review with updated evidence.
    • Assumptions/dependencies: Sponsorship from leadership; cross-functional participation; willingness to halt or pivot when friction signals rise.

Long-Term Applications

These applications rely on continued R&D, scaling, or new governance mechanisms. They reflect the paper’s pathways (scaling, paradigm shifts, recursive improvement, multi-agent collectives) and anticipate the impact of key bottlenecks and frictions.

  • Controlled recursive self-improvement platforms (industry, academia, policy; software, healthcare, materials)
    • Use case: Orchestrate AI systems that iteratively improve AI (models, data, evaluators) under safety constraints, targeting science acceleration (e.g., drug discovery, materials design).
    • Tools/products/workflows: “Autonomous lab” stacks that integrate hypothesis generation, simulation, planning, and robotic execution; self-auditing loops with immutable logs; capability throttles.
    • Assumptions/dependencies: Reliable scientific simulators/benchmarks; robust containment and rollback; verifiable safety criteria; liability frameworks.
  • Multi-agent “group agency” operating systems (industry, policy; software, robotics, logistics)
    • Use case: Build platforms for large-scale collectives of agents that share state and coordinate at high bandwidth to tackle complex operations (e.g., global supply chains, disaster response).
    • Tools/products/workflows: Collective memory fabrics, coordination protocols, credit assignment mechanisms, incentive-aligned marketplaces; real-time oversight and kill-switches.
    • Assumptions/dependencies: Verified communication and control channels; resilience to emergent misbehavior; formal methods for collective safety; compute/network capacity.
  • Knowledge-seeking (KS) exploration engines for open-ended discovery (academia, industry; healthcare, energy, climate science)
    • Use case: Deploy agents optimizing for information gain to explore scientific spaces (e.g., new therapeutics, catalysts, climate interventions) with integrated uncertainty estimates.
    • Tools/products/workflows: KS-driven experiment planners; active-learning loops with lab automation; novelty/utility scoring; Bayesian evidence management.
    • Assumptions/dependencies: High-fidelity simulators and/or wet-lab integration; validated priors; safe exploration constraints; reproducibility standards.
  • Approximations to universal AI for general-purpose problem solving (academia, industry; cross-sector)
    • Use case: Develop algorithms inspired by UAI/AIXI that improve data efficiency and generality in non-stationary environments (e.g., operations research, strategy games, adaptive control).
    • Tools/products/workflows: Meta-learners, program synthesis, mixture-of-experts+search hybrids; continual-learning benchmarks reflecting “lifetime” performance.
    • Assumptions/dependencies: Theoretical advances to make approximations tractable; compute to support search/meta-optimization; strong evaluation protocols.
  • Overcoming the abstraction barrier (academia, industry; robotics, software, science)
    • Use case: Train systems that can discover novel abstractions from raw data and interactions, enabling breakthroughs beyond human conceptual scaffolds (e.g., new physical laws, design principles).
    • Tools/products/workflows: Multi-modal pretraining on rich interaction data; causal representation learning; embodiment in simulators/robots; curriculum learning for abstraction emergence.
    • Assumptions/dependencies: Scalable interactive data collection; robust causal inference; interpretable representation diagnostics.
  • Robotics swarms and high-bandwidth experience sharing (industry; logistics, agriculture, manufacturing, healthcare)
    • Use case: Leverage “lossless replication” and shared learning to deploy swarms that rapidly propagate improvements and adapt in the field.
    • Tools/products/workflows: Federated learning with causal safeguards; shared replay buffers; collective planning; sim-to-real pipelines and safety cages.
    • Assumptions/dependencies: Reliable perception/action under real-world noise; safety-certified control; network reliability; legal frameworks for autonomy.
  • Virtual societies and policy testbeds (policy, academia; economics, public health, urban planning)
    • Use case: Use agent-based simulations populated by advanced AI agents to stress-test policies (e.g., economic interventions, pandemic responses) before real-world deployment.
    • Tools/products/workflows: High-fidelity simulators; behaviorally validated agents; experiment management; causal analysis and counterfactual reporting.
    • Assumptions/dependencies: Validated micro-to-macro dynamics; guardrails against inappropriate extrapolation; transparency for public trust.
  • Global compute and energy governance frameworks (policy, finance, energy; cross-sector)
    • Use case: Establish standards for compute reporting, caps, audits, and energy coupling (e.g., clean-energy requirements, heat reuse) aligned with effective compute growth.
    • Tools/products/workflows: Registries for compute stock and training runs; standardized safety pre-deployment reports (including benchmark stitching and red-teaming); market mechanisms for clean power procurement.
    • Assumptions/dependencies: International coordination; auditability of claimed metrics; enforcement mechanisms; incentives for compliance.
  • Markets and standards for high-quality data creation/curation (industry, academia, policy; software, healthcare, education)
    • Use case: Create verified marketplaces for data that explicitly measure and reward utility, novelty, and safety (privacy, IP), reducing reliance on web scrapes and lowering “data wall” risks.
    • Tools/products/workflows: Data valuation metrics; licenses and provenance tracking; “distance from training” validators; challenge sets for robustness and fairness.
    • Assumptions/dependencies: Agreement on valuation and quality metrics; legal clarity (IP, privacy); robust watermarking and provenance tech.
  • ASI verification, governance, and capability control (policy, industry, academia; cross-sector)
    • Use case: Build standards and toolchains for capability evaluations, interpretability, and control (rate limiters, compute governors, sandboxing) to manage frontier systems and collectives.
    • Tools/products/workflows: Third-party eval suites tied to staged deployment gates; formal verification of subsystems; continuous monitoring with audit trails and off-switches.
    • Assumptions/dependencies: Reliable metrics that predict dangerous behavior; access for auditors; incident reporting norms; societal consensus on thresholds.
  • Personal agent ecosystems with suspend/replicate/resume primitives (daily life, industry; software, education)
    • Use case: Leverage substrate independence and replication to provide persistent, privacy-preserving personal agents that parallelize life admin, learning, and creativity.
    • Tools/products/workflows: Secure personal memory vaults; parallel task runners; context-switching policies; user-governed compute budgets and transparency.
    • Assumptions/dependencies: Strong privacy and identity management; usable transparency/consent interfaces; cost-effective serving; clear liability boundaries.

Notes on Feasibility, Assumptions, and Dependencies

  • Compute and energy: Many long-term applications depend on sustained growth in effective compute and energy build-outs, as well as continued algorithmic efficiency gains.
  • Data availability and quality: Synthetic/interactive data pipelines, validated marketplaces, and robust evaluation are needed to mitigate the data wall and avoid contamination.
  • Measurement and evaluation: Benchmark stitching and progress tracking must be trustworthy; otherwise capability extrapolations can mislead product and policy decisions.
  • Safety and governance: Recursive improvement and large collectives require formal safety cases, auditability, and enforcement mechanisms to manage systemic risks.
  • Scientific and engineering advances: Approximations to universal AI, abstraction discovery, and high-fidelity simulators are research-intensive and contingent on theoretical breakthroughs.
  • Legal and social license: Education, healthcare, and robotics deployments depend on regulatory approval, public trust, and demonstrable benefit–risk profiles.

These applications translate the paper’s core insights—scaling dynamics, evaluation rigor, recursive improvement, multi-agent coordination, and governance-aware planning—into concrete tools and workflows for today and for longer-term development.

Glossary

  • Abstraction Barrier: Hypothesis that AI trained on human abstractions may struggle to discover novel concepts purely from raw data. "see our related discussion of the Abstraction Barrier in \Cref{sec:pathways-and-bottlenecks}"
  • AGI: Artificial General Intelligence; a system with roughly median human-level intelligence on cognitive tasks. "AI progress beyond human-level AGI"
  • AIXI: A mathematical formalism of a universal, incomputable optimal agent that sets an upper bound on machine intelligence. "formally characterized via the AIXI framework \citep{Hutter:24uaibook2}"
  • Algorithmic Efficiency: The amount of compute needed to reach a performance threshold; improves when the same performance is achieved with less compute. "Perhaps more surprisingly, the third factor, \gls{algorithmicefficiency}, has also steadily improved (exponentially) over the last decade."
  • ASI: Artificial Superintelligence; systems that exceed the capabilities of large, well-coordinated human-expert collectives across most domains. "Are we seeing the onset of the rise of artificial superintelligence (ASI) that exceeds what human collectives are capable of across a very broad spectrum of tasks?"
  • Bekenstein bound: A physical limit on the maximum information storable in a finite region with finite energy. "Bekenstein bound for maximum information that can be contained in a finite space with finite energy."
  • Benchmark Stitching: A method to compare and extrapolate capabilities across heterogeneous models and benchmarks in a unified way. "More recently benchmark stitching \citep{ho2025rosettastoneaibenchmarks} offers a sound framework for capability extrapolations based on heterogeneous models and benchmarks."
  • Bremermann's limit: A theoretical upper bound on the rate of computation for a physical system. "Bremermann's limit for the maximum speed of computation"
  • Chain-of-thought: An inference-time method where models generate intermediate reasoning steps to improve answers. "e.g., chain-of-thought reasoning'',thinking'', etc."
  • Effective compute: The aggregated growth in usable compute combining hardware progress, investment scale-up, and algorithmic efficiency. "All three growth factors (better hardware, larger hardware investments, more efficient algorithms) can thus be multiplied into a single growth rate of \glslink{effectivecompute}{effective compute} \citep{aschenbrenner2024situational}"
  • Embodiment factor: The ratio of an agent’s internal processing capacity to its input/output bandwidth. "Lawrence defines a so called ``embodiment factor'' as the ratio of internal processing capacity over input/output rate."
  • Hyperbolic growth: Super-exponential growth where the rate itself increases with the quantity, potentially leading to a finite-time singularity. "such as \gls{hyperbolicgrowth}, where growth rates are not constant (as they are in exponential growth) but increase as a function of the quantity that grows."
  • Intelligence explosion: A rapid, positive feedback loop in which smarter systems accelerate the creation of even smarter systems. "and is the basis for many scenarios of fast AI take off or intelligence explosions"
  • Kolmogorov complexity: The length of the shortest program that generates a given string; a measure of simplicity. "simpler ones (lower Kolmogorov complexity) are given more weight"
  • Landauer principle: The minimum thermodynamic energy required to erase one bit of information. "Landauer principle for energy required for computation (erasure of information)"
  • Legg-Hutter score: A formal measure of intelligence defined as average performance over all (simplicity-weighted) computable tasks. "we take inspiration from the \gls{legghutterscore} as a universal measure of intelligence."
  • Moore's law: The historical trend of increasing compute per dollar due to hardware improvements, roughly annually by a constant factor. "(\glslink{mooreslaw}{``Moore's law''} and related improvements \citep{owid-moores-law}) have increased compute per dollar for six decades at a rate of about $\boldsymbol{1.5\times$ per year}."
  • Recursive improvement: A feedback process where AI systems help improve subsequent AI, accelerating progress. "we discuss four potential pathways from AGI to ASI: scaling AGI, AI paradigm shifts, recursive improvement, and ASI emerging from large-scale multi-agent collectives."
  • Scaling laws: Empirical relations predicting model performance as a function of data, model size, and compute. "At least for a limited extrapolation range, scaling laws \citep{Kaplan2020ScalingLaws} have been highly predictive of how capabilities improve with more compute"
  • Singularity: A hypothesized point where growth becomes infinite in finite time due to accelerating feedback. "Is the Singularity near?"
  • Solipsistic superintelligence: A hypothetical agent that optimizes internal predictive accuracy without adequate grounding in external reality or cooperation. "does not lead to ``solipsistic superintelligence'', a concept discussed in \citet{trivedi2026solipsistic}"
  • Test-time scaling: Using extra inference-time compute (e.g., search, sampling, chain-of-thought) to boost capability beyond training-limited performance. "higher effective compute budgets for \gls{testtimescaling} (chain-of-thought reasoning'',thinking'', etc.)"
  • Universal AI (UAI): The theoretical endpoint of machine intelligence (e.g., AIXI), maximizing the Legg-Hutter intelligence measure. "The endpoint of this continuum, Universal AI, is theoretically well understood"
  • Universal Constructor: A theoretical device capable of constructing any physically allowed object or system. "(c.f. Universal Constructor \citep{von1966theory, janzing2010there, deutsch2013constructor})."

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 7 tweets with 693 likes about this paper.

HackerNews

  1. From AGI to ASI (5 points, 0 comments)