Embedded Agency (1902.09469v3)

Published 25 Feb 2019 in cs.AI

Abstract: Traditional models of rational action treat the agent as though it is cleanly separated from its environment, and can act on that environment from the outside. Such agents have a known functional relationship with their environment, can model their environment in every detail, and do not need to reason about themselves or their internal parts. We provide an informal survey of obstacles to formalizing good reasoning for agents embedded in their environment. Such agents must optimize an environment that is not of type "function"; they must rely on models that fit within the modeled environment; and they must reason about themselves as just another physical system, made of parts that can be modified and that can work at cross purposes.

Citations (31)

View on Semantic Scholar

Collections

Sign up for free to add this paper to one or more collections.

Sign Up

Summary

Embedded Agency

The paper "Embedded Agency" by Abram Demski and Scott Garrabrant from the Machine Intelligence Research Institute provides a comprehensive exploration of embedded agency, a concept challenging the traditional models of rational action by considering agents that are not distinct and separate from their environment, but rather parts of it. This notion diverges from dualistic models where agents operate externally, akin to video game players with predefined input-output channels.

Key Aspects of Embedded Agency

Demski and Garrabrant's work identifies four fundamental distinctions between traditional dualistic agents and embedded agents:

Decision Theory: Embedded agents lack well-defined input/output channels, complicating the application of standard decision theory models incapable of forming a simple function from actions to outcomes. The work discusses the challenges agents face when trying to reason counterfactually. Problems like logical counterfactuals, logical updates, and interaction with multiple agent copies lead to complications that conventional theories fail to resolve satisfactorily.
Embedded World-Models: Embedded agents are smaller than their environment, further complicating their ability to store detailed models. This section explores the concept of realizability problems due to the impossibility of including an accurate representation of the environment within an agent's hypothesis space. It highlights the difficulties that arise from logical uncertainty, where traditional Bayesian reasoning models struggle with inductively learning about environments not explicitly encompassed within the agent's prior.
Robust Delegation: Embedded agents need mechanisms to robustly delegate tasks and replicate goal systems. This involves Vingean reflection and value learning challenges, where agents, particularly smarter successors, need to learn what the initial agents want without succumbing to logical paradoxes or inconsistencies such as Goodhart's law, which discusses the pitfalls in proxy optimization.
Subsystem Alignment: Embedded agents composed of adaptable parts need strategies to ensure alignment of sub-components or subsystems to prevent internal adversarial behavior and unintended goal optimizations. Here, the paper discusses issues like benign optimization, transparency, and robustness concerning the relative capabilities of subsystem components.

Implications and Future Directions

The discussions in this paper point to significant theoretical implications for understanding rational agency in AI systems. Embedded agency calls for a foundational rethink of rational agent models to address conceptual issues around decision-making, world-modeling, delegation, and subsystem coordination.

This paper suggests that achieving a coherent model of embedded agency is pivotal for developing future AI systems capable of general intelligence. While traditional methods rely on dualistic separation between agents and their environments, the embedded approach integrates both into a singular, intertwined entity. Addressing these challenges can provide insights into creating AI systems that interact safely and effectively in complex, dynamic environments without undue reliance on brute-force solutions and may ultimately contribute to the broader field of AI alignment.

Further research could explore computationally viable approaches to overcoming issues of scale in world-models, refine decision theories to account for embedded hypotheticals logically, and develop robust delegation mechanics that prevent adversarial system behaviors from arising inadvertently. The culmination of such efforts would further the theoretical understanding necessary to ensure smarter, safer, adaptive AI systems that operate with aligned goals and reliable sub-component integration.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (2)

Tweets

https://twitter.com/nabla_theta/status/1808409172146638970

YouTube

Show All Videos