Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fully Autonomous AI Agents Should Not be Developed (2502.02649v2)

Published 4 Feb 2025 in cs.AI

Abstract: This paper argues that fully autonomous AI agents should not be developed. In support of this position, we build from prior scientific literature and current product marketing to delineate different AI agent levels and detail the ethical values at play in each, documenting trade-offs in potential benefits and risks. Our analysis reveals that risks to people increase with the autonomy of a system: The more control a user cedes to an AI agent, the more risks to people arise. Particularly concerning are safety risks, which affect human life and impact further values.

Summary

  • The paper argues that risks associated with AI agents, particularly safety, privacy, and trust, are significantly amplified as agent autonomy increases.
  • Fully autonomous agents capable of writing and executing their own code pose unacceptable risks compared to semi-autonomous systems with human oversight.
  • Analyzing autonomy levels based on decreasing human input, the paper advocates for human control mechanisms and safety verification to mitigate escalating risks.

This paper posits that fully autonomous Artificial Intelligence (AI) agents should not be developed due to the increasing risks to individuals as autonomy increases. The authors delineate different AI agent levels and detail the ethical values involved in each level by building from prior literature and current product marketing.

The authors identify risks that affect human life and impact further values and note that these risks arise from the same benefits that motivate AI agent development, such as obviating the need for developers to design all system actions. They argue that the development of fully autonomous AI agents, which are capable of writing and executing their own code beyond predefined constraints, should be avoided. The paper suggests that semi-autonomous systems, which retain some level of human control, offer a more favorable risk-benefit profile, depending on the degree of autonomy, the complexity of tasks assigned, and the nature of human involvement in decision-making.

The paper reviews the history of artificial agents, referencing Cadmus from ancient mythology and Aristotle's speculation on automata replacing human slavery. It mentions Ktesibios of Alexandria's water clock as an early precursor to artificial agents. The paper then discusses Asimov's Three Laws of Robotics, highlighting the challenges of translating such concepts into computer software. It notes the excitement surrounding AI agents in the 1990s and their use in reinforcement learning, citing the implementation of separate goals and objective functions for independent actors within the same action space.

The authors observe that in the 2020s, AI agents broadened the range of functionality that computer systems could provide while requiring less input from users. They point to the integration of AI models, such as state-of-the-art LLMs, into robotic systems. The emergence of autonomous weapons systems is noted as a controversial area of development, raising ethical questions about accountability, moral responsibility, and safety considerations.

The authors define an AI agent as a computer software system capable of creating context-specific plans in non-deterministic environments. They also concede that there is not full consensus on what an AI agent is. They note that a commonality across recently introduced AI agents is that they act with some level of autonomy, decomposing goals into subtasks and executing them without direct human intervention. They also discuss the concept of "agency" and its philosophical implications for AI systems. The paper analyzes agency through the lens of intentional action, where actions are explained in terms of the agent's mental states and their capacity to act for reasons, while also noting the lack of mental states as historically discussed in artificially intelligent agents.

The paper introduces a scale of AI agent levels, corresponding to decreasing input from a user and decreasing code written by the agent developers. The authors propose that the more autonomous the system, the more human control is ceded.

To examine the relationship between AI agent autonomy level and ethical implications, the authors conducted a systematic analysis of how agents are conceptualized and deployed across different contexts, focusing on how varying degrees of agent autonomy interact with value propositions in research and commercial implementations. The methodology included collecting and categorizing statements about agents' capabilities, benefits, and harms; identifying recurring AI agent value propositions; converging on a value taxonomy; and analyzing the role of values with increased autonomy.

The authors distinguish three main patterns in how agentic levels impact value preservation: inherent risks (\odot), present at all autonomy levels due to limitations in an AI agent's base model(s); countervailing relationships ($\downuparrows$), where increasing autonomy creates both risks and opportunities with respect to an ethical value; and amplified risks (\uparrow), where increasing autonomy magnifies existing vulnerabilities.

The value taxonomy includes:

  • Accuracy: The accuracy of an AI agent is modulated by the accuracy of the models it's based on. A commonly used LLM is known to produce incorrect information that appears correct.
  • Assistiveness: Assistive agents may augment capabilities; poorly designed assistiveness could lead to harms from over-reliance or inappropriate trust.
  • Consistency: LLMs are known to be highly inconsistent. Measuring AI agent consistency will require the development of new evaluation protocols, especially in sensitive domains, and potentially new ways to deal with model confabulations.
  • Efficiency: Identifying and fixing errors that agents introduce can be time-consuming, difficult, and stressful. Improved efficiency inexorably brings with it rebound effects, notably in terms of space, time and behaviors that are impacted by new technologies.
  • Equity: Inequitable outcomes may emerge due to sample bias in data collection and job loss from agents replacing human workers.
  • Flexibility: The more an agent can affect and be affected by systems outside of its more limited testing environment, the greater the risk.
  • Humanlikeness: Humanlikeness can lead users to anthropomorphize the system, which may have negative psychological effects such as overreliance and addiction.
  • Privacy: For agents to work according to user expectations, the user may provide personal information. If there is a privacy breach, the interconnectivity of different content brought by the AI agent can make things worse.
  • Relevance: Personalization can amplify existing biases and create new ones.
  • Safety: The unpredictable nature of agent actions means that seemingly safe individual operations could combine in harmful ways, creating new risks that are difficult to prevent.
  • Security: AI agents present serious security challenges due to their handling of often sensitive data combined with their safety risks, such as ability to interact with multiple systems and the by-design lack of detailed human oversight.
  • Sustainability: The models that current agents are based on bring negative environmental impacts, such as carbon emissions and usage of potable water.
  • Trust: As the agentic level increases, human trust can lead to increased risks stemming from increased agent flexibility and issues in its accuracy, consistency, privacy, safety, security, and truthfulness.
  • Truthfulness: The deep learning technology modern AI agents are based on is well-known to be a source of false information, which can take shape in forms such as deepfakes or misinformation.

The authors also mention the potential for increased benefit, particularly with respect to assistance, efficiency, equity, relevance of outcomes, and some argue for sustainability.

The paper concludes by referencing the history of nuclear close calls as a lesson about the risks of ceding human control to autonomous systems. It suggests the adoption of agent levels, the development of human control mechanisms, and safety verification methods.

Youtube Logo Streamline Icon: https://streamlinehq.com