Generative Agents: Interactive Simulacra of Human Behavior (2304.03442v2)

Published 7 Apr 2023 in cs.HC, cs.AI, and cs.LG

Abstract: Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a LLM to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing LLMs with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.

Citations (1,280)

View on Semantic Scholar

Summary

The paper presents a novel generative agent architecture that integrates long-term memory, retrieval, reflection, and planning to simulate human-like behavior.
It employs a memory stream with relevance scoring based on recency, importance, and cosine similarity to ensure contextually appropriate decision-making.
Evaluations in a sandbox environment demonstrate that agents can engage in coordinated social interactions and dynamic planning, outperforming baseline models.

The paper introduces Generative Agents: Interactive Simulacra of Human Behavior, computational software agents designed to simulate believable human behavior in interactive environments. The core idea is to extend LLMs with an architecture that allows agents to maintain long-term consistency, remember and synthesize past experiences, and dynamically plan their actions and reactions based on their environment and history. The authors demonstrate these agents in a sandbox environment reminiscent of The Sims.

The generative agent architecture consists of three main components:

Memory Stream: This is the agent's long-term memory, storing a comprehensive record of its experiences in natural language. The basic unit is an "observation," representing events perceived by the agent. Each memory object includes a natural language description, creation timestamp, and last access timestamp.
Retrieval: To manage the large memory stream and fit within the limited context window of an LLM, a retrieval function selects the most relevant memories for the agent's current situation. This function scores memories based on a weighted combination of:
- Recency: More recently accessed memories score higher (exponential decay).
- Importance: Mundane vs. poignant memories are scored by prompting the LLM (1-10 scale).
- Relevance: Memories semantically related to the current situation (query) score higher, calculated using cosine similarity of embedding vectors. The top-scoring memories that fit the context window are passed to the LLM.
Reflection: This process synthesizes observations and existing reflections into higher-level, more abstract thoughts. Reflections are generated periodically when the cumulative importance of recent events exceeds a threshold. The process involves:
- Identifying salient questions from recent memories using the LLM.
- Retrieving relevant memories (including other reflections) based on these questions.
- Prompting the LLM to generate insights based on the retrieved memories, citing the evidence. These generated reflections are added to the memory stream, forming a hierarchical structure or "reflection tree" where insights build upon lower-level memories. Reflection enables agents to generalize and make inferences that guide their behavior.
Planning: To ensure long-term coherence, agents create hierarchical plans. These plans are stored in the memory stream and influence future actions.
- Initial high-level daily plans (5-8 chunks) are generated using the LLM based on the agent's summary description and previous day's activities.
- These high-level plans are recursively decomposed into finer-grained actions (e.g., hour-long chunks, then 5-15 minute chunks).
- Plans include a location, start time, and duration.

Reacting and Updating Plans: Generative agents operate in a continuous action loop. At each time step, they perceive their environment (observations are added to memory). The architecture prompts the LLM with recent observations and relevant context retrieved from memory to determine if the agent should react. If a reaction is triggered, the agent's current plan is regenerated starting from that point.

Dialogue: Conversations between agents are generated by conditioning the LLM on the agents' memories about each other and the dialogue history. This allows agents to converse believably based on their relationship and shared experiences.

Sandbox Environment Implementation:

The paper demonstrates generative agents in a 2D sandbox world called Smallville, built using the Phaser framework.

The environment (areas, objects) is represented as a tree structure, translated to natural language for the agents (e.g., "there is a stove in the kitchen").
Agents maintain their own subgraph of the environment tree based on what they have perceived.
A server mediates interaction between the agents and the game engine, updating agent locations and object states based on agent actions.
Agent actions are generated by prompting the LLM and recursively traversing the agent's environment tree to determine specific locations (e.g., identifying the correct room and object for an action).
Initial agent identities are provided as a paragraph description, split into initial seed memories.

Evaluation:

The authors conducted two evaluations:

Controlled Evaluation: Agents were "interviewed" with natural language questions across five categories: self-knowledge, memory, planning, reactions, and reflections. Human evaluators ranked the believability of responses generated by the full architecture, three ablated versions (removing memory, reflection, and/or planning), and a human crowdworker baseline. Results showed the full architecture was significantly more believable than all ablations and the crowdworker baseline, demonstrating the importance of each architectural component. Qualitative analysis revealed agents could retrieve memories but sometimes embellished them, and reflection was crucial for synthesizing experiences into deeper insights.
End-to-End Evaluation: A simulation with 25 agents ran for two game days to observe emergent social behaviors. Measurements tracked:
- Information Diffusion: The spread of knowledge about Sam's mayoral candidacy and Isabella's party was measured via interviews. Information spread successfully among agents.
- Relationship Formation: Network density of mutual knowledge increased significantly, indicating agents formed new relationships.
- Coordination: Agents successfully coordinated to attend Isabella's Valentine's Day party after being invited. The simulation also revealed boundary conditions and errors, such as agents choosing less typical locations with larger memory, misinterpreting physical norms of the environment, and exhibiting overly polite or cooperative behavior potentially due to LLM instruction tuning.

Applications and Future Work:

The paper suggests generative agents could be applied in social prototyping (populating online forums or virtual worlds), training simulations, human-centered design proxies (modeling user behavior for system design), and creating more dynamic game NPCs. Limitations and future work include improving the memory retrieval function, optimizing computational performance and cost, conducting longer-term and more rigorous evaluations, testing robustness to prompt/memory hacking, and addressing inherent biases and limitations inherited from the underlying LLMs.

Ethics and Societal Impact:

The authors discuss important ethical risks:

Parasocial relationships: Users may anthropomorphize agents, leading to over-reliance or emotional attachment. Mitigations proposed are explicit disclosure of computational nature and value-alignment to prevent inappropriate reciprocity.
Impact of errors: Errors in agent inference could cause harm in applications like ubiquitous computing. Focusing on lower-stakes environments like games and following human-AI design best practices are suggested.
Exacerbating Generative AI risks: Agents could contribute to deepfakes, misinformation, or tailored persuasion. Maintaining audit logs is proposed as a deterrent and detection mechanism.
Over-reliance: Developers might use agents to displace human input in design processes. The authors stress agents should complement, not replace, real human stakeholders.

PDF Markdown

Related Papers

Tweets

https://twitter.com/archiexzzz/status/1870913939959435436

https://twitter.com/yoheinakajima/status/1752109594523103242

https://twitter.com/imbue_ai/status/1788699153972793376

https://twitter.com/cocktailpeanut/status/1784627889725452695

https://twitter.com/aogamesorg/status/1843226601766674881

https://twitter.com/Somniumcra/status/1883063587620766180

YouTube

Show All Videos