Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
98 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory (2311.08719v1)

Published 15 Nov 2023 in cs.CL
Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory

Abstract: Memory-augmented LLMs have demonstrated remarkable performance in long-term human-machine interactions, which basically relies on iterative recalling and reasoning of history to generate high-quality responses. However, such repeated recall-reason steps easily produce biased thoughts, \textit{i.e.}, inconsistent reasoning results when recalling the same history for different questions. On the contrary, humans can keep thoughts in the memory and recall them without repeated reasoning. Motivated by this human capability, we propose a novel memory mechanism called TiM (Think-in-Memory) that enables LLMs to maintain an evolved memory for storing historical thoughts along the conversation stream. The TiM framework consists of two crucial stages: (1) before generating a response, a LLM agent recalls relevant thoughts from memory, and (2) after generating a response, the LLM agent post-thinks and incorporates both historical and new thoughts to update the memory. Thus, TiM can eliminate the issue of repeated reasoning by saving the post-thinking thoughts as the history. Besides, we formulate the basic principles to organize the thoughts in memory based on the well-established operations, (\textit{i.e.}, insert, forget, and merge operations), allowing for dynamic updates and evolution of the thoughts. Furthermore, we introduce Locality-Sensitive Hashing into TiM to achieve efficient retrieval for the long-term conversations. We conduct qualitative and quantitative experiments on real-world and simulated dialogues covering a wide range of topics, demonstrating that equipping existing LLMs with TiM significantly enhances their performance in generating responses for long-term interactions.

The paper introduces Think-in-Memory (TiM), a novel memory mechanism designed to augment LLMs with long-term memory capabilities by enabling them to remember and selectively recall historical thoughts in long-term interaction scenarios. The motivation stems from the limitations of existing memory-augmented LLMs, which rely on iterative recalling and repeated reasoning over the history in an external memory cache, leading to inconsistent reasoning paths and high retrieval costs. TiM addresses these issues by saving thoughts as memories, similar to the metacognition process in humans, rather than saving the details of original events.

The TiM framework consists of two stages:

  • In the recalling stage, LLMs generate responses to new queries by recalling relevant thoughts from memory.
  • In the post-thinking stage, the LLM engages in reasoning and thinking about the response and saves new thoughts into an external memory.

To mirror the cognitive process of humans, the paper formulates basic principles to organize thoughts in memory based on well-established operations, such as insert, forget, and merge, allowing for dynamic updates and evolution of the thoughts. TiM utilizes Locality-Sensitive Hashing (LSH) to facilitate efficient hand-in (insert thoughts) and hand-out (recall thoughts) operations. TiM is designed to be LLM-agnostic, enabling its integration with both closed-source LLMs like ChatGPT and open-source LLMs like ChatGLM.

The key components of TiM are:

  • Agent A\mathcal{A}: A pre-trained LLM model to facilitate dynamic conversations.
  • Memory Cache M\mathcal{M}: A continually growing hash table of key-value pairs, where the key is the hash index and the value is a single thought.
  • Hash-based Mapping F()\mathbb{\mathbf{F}(\cdot)}: LSH is introduced to quickly save and find relevant thoughts in M\mathcal{M}.

The paper defines an "inductive thought" as text containing the relation between two entities, satisfying a relation triple (Eh,ri,Et)(E_h, r_i, E_t), where EhE_h is the head entity connected with the tail entity EtE_t via the relation rir_i.

The paper utilizes a hash table as the architecture of TiM’s storage system, where similar thoughts are assigned the same hash index. The LSH method assigns each dd-dimension embedding vector xRdx\in \mathbf{R}^d to a hash index F(x)\mathbf{F}(x), where nearby vectors get the same hash index with higher probability. The hash function is defined as:

F(x)=argmax([xR;xR])\mathbb{\mathbf{F}(x) = \mathop{\arg\max}\left(\left[xR; -xR\right]\right)}

where:

  • xx is the dd-dimension embedding vector
  • RR is a random matrix of size (d,b/2)(d, b/2)
  • bb is the number of groups in the memory

The memory retrieval operates as a two-stage retrieval task for the most relevant thoughts: LSH-based retrieval followed by similarity-based retrieval.

The organization principles based on operations for dynamic updates and evolution of thoughts supported by TiM include:

  • Insert: storing new thoughts into the memory.
  • Forget: removing unnecessary thoughts from the memory, such as contradictory thoughts.
  • Merge: merging similar thoughts in the memory, such as thoughts with the same head entity.

The paper adopts Low-Rank Adaptation (LoRA) for computation-efficient fine-tuning. LoRA fine-tunes according to y=Wx+BAxy = Wx + BAx, where WRd×kW \in \mathbf{R}^{d\times k}, BRd×rB \in \mathbf{R}^{d\times r}, ARr×kA \in \mathbf{R}^{r\times k}, and rmin(d;k)r \ll \min(d; k).

The paper evaluates TiM on three datasets: KdConv, Generated Virtual Dataset (GVD), and Real-world Medical Dataset (RMD). The LLMs used are ChatGLM and Baichuan2. The baselines include answering questions without any memory mechanism and SiliconFriend.

The evaluation metrics include:

  • Retrieval Accuracy
  • Response Correctness
  • Contextual Coherence

On the GVD dataset, TiM exhibited superior performance across all metrics compared to SiliconFriend, especially for contextual coherence. On the KdConv dataset, TiM obtained the best results across all topics (film, music, and travel). On the RMD dataset, TiM improved the overall response performance for real-world medical conversations, with significant improvements in response correctness and contextual coherence. Retrieval time was also reduced using TiM compared to calculating pairwise similarity between the question and the whole memory.

A medical agent, TiM-LLM, was developed based on ChatGLM and TiM in the context of patient-doctor conversations. TiM-LLM serves as an auxiliary tool for clinical doctors to provide treatment options and medical suggestions for patients' needs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Lei Liu (332 papers)
  2. Xiaoyan Yang (50 papers)
  3. Yue Shen (243 papers)
  4. Binbin Hu (42 papers)
  5. Zhiqiang Zhang (129 papers)
  6. Jinjie Gu (50 papers)
  7. Guannan Zhang (85 papers)
Citations (14)