A-MEM: Agentic Memory for LLM Agents (2502.12110v9)

Published 17 Feb 2025 in cs.CL and cs.HC

Abstract: While LLM agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems enable basic storage and retrieval but lack sophisticated memory organization, despite recent attempts to incorporate graph databases. Moreover, these systems' fixed operations and structures limit their adaptability across diverse tasks. To address this limitation, this paper proposes a novel agentic memory system for LLM agents that can dynamically organize memories in an agentic way. Following the basic principles of the Zettelkasten method, we designed our memory system to create interconnected knowledge networks through dynamic indexing and linking. When a new memory is added, we generate a comprehensive note containing multiple structured attributes, including contextual descriptions, keywords, and tags. The system then analyzes historical memories to identify relevant connections, establishing links where meaningful similarities exist. Additionally, this process enables memory evolution - as new memories are integrated, they can trigger updates to the contextual representations and attributes of existing historical memories, allowing the memory network to continuously refine its understanding. Our approach combines the structured organization principles of Zettelkasten with the flexibility of agent-driven decision making, allowing for more adaptive and context-aware memory management. Empirical experiments on six foundation models show superior improvement against existing SOTA baselines. The source code for evaluating performance is available at https://github.com/WujiangXu/AgenticMemory, while the source code of agentic memory system is available at https://github.com/agiresearch/A-mem.

PDF Abstract

This paper introduces A-Mem, an agentic memory system designed to enhance the long-term interaction capabilities of LLM agents. It addresses the limitations of existing memory systems, which often rely on fixed structures and predefined operations, hindering their adaptability across diverse tasks and long-term scenarios. A-Mem draws inspiration from the Zettelkasten knowledge management method to create a dynamic, self-organizing memory network.

Core Concepts and Implementation:

A-Mem operates through three main stages for memory storage: Note Construction, Link Generation, and Memory Evolution.

Note Construction:
- When an LLM agent encounters new information (e.g., from interaction history), A-Mem creates a structured memory note.
- Each note $m_i$ $m_{i}$ contains multiple attributes:
  - $c_i$ : Original interaction content.
  - $t_i$ : Timestamp.
  - $K_i$ : Keywords generated by an LLM to capture key concepts.
  - $G_i$ : Tags generated by an LLM for categorization.
  - $X_i$ : Contextual description generated by an LLM for semantic understanding.
  - $L_i$ : A set of links to related memory notes.
- An LLM is prompted (using template $P_{s1}$ ) with the content and timestamp to generate the keywords, tags, and contextual description:
  
  $K_i, G_i, X_i \leftarrow \text{LLM} (c_i \; \Vert t_i \; \Vert P_{s1})$

* A dense vector embedding $e_i$ is computed for each note by encoding the concatenation of its textual components ( $c_i, K_i, G_i, X_i$ ) using a sentence transformer model (like all-minilm-l6-v2 used in the experiments):

$e_i = f_{\text{enc}}[\text{concat}(c_i, K_i, G_i, X_i)]$

* This multi-faceted structure provides both human-readable context and machine-processable embeddings for efficient similarity search and linking.

Link Generation:
- When a new note $m_n$ is added, A-Mem identifies potentially related existing notes.
- It first retrieves the top-k nearest neighbors ($\mathcal{M}_{\text{near}^n$) from the existing memory set $\mathcal{M}$ based on cosine similarity between their embeddings ( $e_n$ and $e_j$ ):
  
  $s_{n,j} = \frac{e_{n} \cdot e_j}{|e_{n}| |e_j|}$
  
  $\mathcal{M}_{\text{near}^n = \{m_j | \; \text{rank}(s_{n,j}) \leq k, m_j \in \mathcal{M}\}$

* An LLM is then prompted (using template $P_{s2}$ ) with the new note $m_n$ and its nearest neighbors $\mathcal{M}_{\text{near}^n}$ to determine if meaningful connections exist and update the link set $L_n$ . The LLM analyzes shared attributes and semantic similarities to decide which links to establish. * This creates an emergent network where related memories are interconnected, similar to Zettelkasten's concept of linked notes forming conceptual 'boxes'.

Memory Evolution:
- After establishing links for the new note $m_n$ , A-Mem can potentially update the attributes of the neighboring notes ($\mathcal{M}_{\text{near}^n$) it connected with.
- For each neighbor $m_j \in \mathcal{M}_{\text{near}^n}$ , an LLM is prompted (using template $P_{s3}$ ) with $m_n$ , $m_j$ , and the other neighbors ( $\mathcal{M}_{\text{near}^n \setminus m_j}$ ) to decide whether to evolve $m_j$ .
- Evolution involves updating the contextual description $X_j$ , keywords $K_j$ , and tags $G_j$ of the existing note $m_j$ based on the new context provided by $m_n$ . The updated note $m_j^*$ replaces $m_j$ .
  
  $m_j^* \leftarrow \text{LLM} (m_n \; \Vert \mathcal{M}_{\text{near}^n \setminus m_j} \; \Vert m_j \; \Vert P_{s3})$

* This allows the memory network to refine its understanding and organization over time as new, related information is integrated.

Memory Retrieval:

When the agent needs to access memory for a query $q$ , the query is embedded using the same text encoder:

$e_q = f_{\text{enc}}(q)$
Cosine similarity is computed between the query embedding $e_q$ and all memory note embeddings $e_i$ .
The top-k most similar memory notes $\mathcal{M}_{\text{retrieved}}$ are retrieved and provided as context to the LLM agent for processing the query.

$s_{q,i} = \frac{e_q \cdot e_i}{|e_q| |e_i|}$

$\mathcal{M}_{\text{retrieved}} = \{m_i | \text{rank}(s_{q,i}) \leq k, m_i \in \mathcal{M}\}$

Practical Implementation & Considerations:

LLM Choice: The quality of generated keywords, tags, context, and the effectiveness of linking/evolution depend on the chosen LLM. The paper experimented with GPT-4o/4o-mini, Qwen-1.5B/3B, and Llama 3.2 1B/3B. Smaller, local models (like Qwen, Llama) can be run using tools like Ollama, potentially reducing cost and latency but might yield lower quality outputs than larger models like GPT-4.
Structured Output: Using libraries like LiteLLM or specific API features (like GPT's structured output) is crucial for reliably parsing the JSON outputs from the LLM during note construction and evolution.
Embedding Model: The choice of sentence transformer affects retrieval quality. all-minilm-l6-v2 is a common choice balancing performance and computational cost.
Vector Database: Although not explicitly mentioned for storage, implementing efficient nearest neighbor search for Link Generation and Retrieval typically requires a vector database (e.g., FAISS, ChromaDB, Pinecone).
Hyperparameter $k$ : The number of neighbors ( $k$ ) considered for linking/evolution and the number of memories retrieved ( $k$ ) impact performance and computational cost. The paper found optimal values varied by task category but generally used $k=10$ as a base, tuning up to $k=50$ for some GPT models/tasks. Higher $k$ provides more context but can introduce noise and increase processing load.
Computational Cost: A-Mem involves multiple LLM calls (for note construction, link generation, evolution) and embedding computations. The frequency of these operations and the size of the memory store will influence overall latency and cost. Compared to methods loading entire histories (like LoCoMo baseline), A-Mem's selective retrieval significantly reduces token count during inference (e.g., ~1.2k-2.5k vs ~17k tokens).
Scalability: The efficiency of the nearest neighbor search (embedding similarity) is key for scalability as the memory store grows.

Evaluation:

Experiments on the LoCoMo dataset (long-term conversational QA) showed A-Mem significantly outperformed baselines like MemoryBank, ReadAgent, and MemGPT, especially on multi-hop reasoning questions requiring synthesis across different memories.
It achieved state-of-the-art results across various metrics (F1, BLEU-1, ROUGE, METEOR, SBERT Similarity) for non-GPT models and competitive results for GPT models, particularly excelling where complex reasoning over linked memories was needed.
Ablation studies confirmed that both Link Generation and Memory Evolution modules contribute significantly to performance.
t-SNE visualizations showed A-Mem creates more structured and clustered memory embedding spaces compared to a baseline without linking and evolution.

In summary, A-Mem offers a practical approach to building more adaptive and effective memory systems for LLM agents by using LLMs agentically to structure, link, and evolve memory notes, inspired by Zettelkasten principles. Its dynamic nature allows for better handling of complex, long-term tasks compared to static memory systems.