Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The AI Agent Index (2502.01635v1)

Published 3 Feb 2025 in cs.SE and cs.AI

Abstract: Leading AI developers and startups are increasingly deploying agentic AI systems that can plan and execute complex tasks with limited human involvement. However, there is currently no structured framework for documenting the technical components, intended uses, and safety features of agentic systems. To fill this gap, we introduce the AI Agent Index, the first public database to document information about currently deployed agentic AI systems. For each system that meets the criteria for inclusion in the index, we document the system's components (e.g., base model, reasoning implementation, tool use), application domains (e.g., computer use, software engineering), and risk management practices (e.g., evaluation results, guardrails), based on publicly available information and correspondence with developers. We find that while developers generally provide ample information regarding the capabilities and applications of agentic systems, they currently provide limited information regarding safety and risk management practices. The AI Agent Index is available online at https://aiagentindex.mit.edu/

Summary

  • The paper introduces a detailed taxonomy of 33 structured fields to document technical, operational, and safety aspects of agentic AI systems.
  • The study employs a mixed-methods approach, combining web searches, literature reviews, and direct developer surveys to index 67 systems with a 36% response rate.
  • The paper highlights a significant gap between documentation of system capabilities and formal safety disclosures, urging improvements in risk management practices.

The paper presents a comprehensive framework and empirical paper for cataloging and analyzing deployed agentic AI systems. It develops what is termed the "AI Agent Index" that systematically documents technical components, intended applications, and risk management features of systems that exhibit agentic characteristics—namely, underspecification, directness of impact, goal-directedness, and long-term planning.

The paper is structured around several key contributions:

  • Structured Documentation Framework:

The work introduces a detailed taxonomy for recording 33 distinct fields of information spanning multiple dimensions, including basic system information; developer details; system components such as backend models, reasoning, planning, and tool use; guardrails and oversight mechanisms; evaluation outcomes; and ecosystem-related data. This granularity is intended to support stakeholders ranging from users and developers to auditors and policymakers.

  • Empirical Survey Methodology:

The authors describe a mixed-methods approach where agentic systems are identified via web searches, literature reviews, and benchmark leaderboards. They then combine publicly available data with direct correspondence with developers, achieving a 36% response rate. The inclusion criteria are carefully defined through a decision tree that filters out models and frameworks that do not exhibit a sufficiently high degree of agency. In total, 67 distinct systems are indexed, each representing a unique manifestation of agentic design.

  • Key Quantitative Findings:
    • About 70.1% of the indexed systems offer some form of public documentation, and nearly 49.3% release their underlying code.
    • In stark contrast, only 19.4% disclose a formal safety policy, with fewer than 10% offering evidence of external safety evaluations.
    • Domain-wise, approximately 74.6% of the systems are specialized in software engineering or computer-use tasks, indicating a strong concentration in these areas.
    • The geographic distribution is heavily skewed toward the USA, with 45 out of 67 systems being developed by US-based teams; moreover, the majority are industry-led, although a significant 26.9% originate from academic laboratories.
  • Insights on Agent Design and Governance:

The paper underscores the heterogeneity and complexity inherent in the agentic AI ecosystem. It highlights the divergent levels of openness between industry and academia, noting that academic projects frequently exhibit a higher degree of code transparency. Additionally, the work points out that while system capabilities and intended uses are well-documented, there is a substantial lack of detailed safety and risk management disclosures. The authors discuss how this imbalance may impede accurate risk assessment and effective policy development. They offer recommendations for structured bug bounties, systematic testing, centralized oversight frameworks, and integration with broader model registries to address these gaps.

  • Limitations and Future Directions:

The authors acknowledge several limitations. The definition of what constitutes an “agent” is intentionally broad and has not been strictly formalized, leaving room for subjective interpretation. The index represents a temporal snapshot (as of December 31, 2024) and may omit internal systems or non-English language projects. There is also an inherent risk that developers may engage in selective disclosure, thereby skewing the publicly available information and the resulting assessments. Looking forward, the paper advocates for future research to refine the selection criteria and documentation breadth, with the dual goals of enhancing the technical understanding of agentic systems and informing more robust AI governance practices.

Overall, the paper contributes a pragmatic yet detailed methodology for longitudinally monitoring agentic AI systems. By providing a replicable framework and concrete observational data, it lays the groundwork for subsequent technical evaluations and policy-oriented studies. This work is particularly useful for researchers and policymakers interested in understanding how design practices, openness, and safety assessments intersect in advanced AI systems with real-world impact.

Youtube Logo Streamline Icon: https://streamlinehq.com