Emergent Mind

Think Before You Act: Decision Transformers with Internal Working Memory

Published May 24, 2023 in cs.LG , cs.AI , and cs.CL


Large language model (LLM)-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance relies on massive data and compute. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model's performance on previous tasks. In contrast to LLMs' implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Thus inspired, we propose an internal working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in both Atari games and meta-world object manipulation tasks. Moreover, we demonstrate that memory fine-tuning further enhances the adaptability of the proposed architecture.


  • The paper discusses how reinforcement learning has benefited from LLMs with implicit memory, but highlights issues with scalability due to data and computational demands.

  • It introduces Decision Transformers with Memory (DT-Mem) which use working memory to more efficiently store and use pertinent past experiences, reducing confusion in decision-making.

  • DT-Mem's architecture features an internal memory matrix and a Low-Rank Adaptation (LoRA) layer for task-specific knowledge fine-tuning without the need for full-model updates.

  • The model's performance has proven superior to larger models in training efficiency and generalization when tested on Atari games and Meta-World environments.

  • The paper identifies two main contributions of DT-Mem: a novel Transformer architecture with integrated working memory and a LoRA-based fine-tuning method for better task adaptation.


The continuous evolution of reinforcement learning (RL) led to the success of LLMs as powerful decision-making agents. Their exceptional generalization capacities hinge on a key mechanism: the implicit memory, constituting neural network parameters that memorize vast datasets. Yet, this method's scalability is inhibited by its excessive reliance on data volume and computational resources.

Working Memory in Decision Making

Amidst efforts to mitigate these inefficiencies, the concept of "working memory" has been adopted from cognitive psychology. The Decision Transformers with Memory (DT-Mem) embody this approach, enabling them to actively store and process relevant past experiences. By separating skill-specific knowledge into an explicit memory structure, DT-Mem is designed for more efficient memory use, eschewing the confusion that could arise from implicit memory when dealing with similar yet distinct tasks.

Model Architecture and Method

DT-Mem introduces a distinct internal memory matrix consisting of two main processes: updating the memory with new information and retrieving from it for decision-making. With content-based addressing borrowed from prior neural network research, DT-Mem locates memory slots for updates or retrievals.

Furthermore, the system's architecture includes a Low-Rank Adaptation (LoRA) layer to fine-tune the memory when confronted with new tasks. Unlike full-model fine-tuning, which can be computationally taxing, this focused approach sharpens task-specific knowledge while leveraging a pre-trained model's broad understanding obtained from large datasets.

Evaluation and Contributions

When applied to Atari games and Meta-World environments, DT-Mem displayed promising training efficiency and generalization, outperforming models with substantially more parameters. The strength of DT-Mem lies in its adaptability; fine-tuning the working memory module with limited data still unlocked superior task adaptation.

In summary, DT-Mem makes two main contributions:

  1. It pioneers a novel Transformer architecture that integrates a memory module for improved generalization and computational efficiency.
  2. It introduces a LoRA-based fine-tuning method that bolsters adaptation to unseen tasks with less data reliance.

Conclusion and Outlook

The findings spotlight DT-Mem as a potent model that fine-tunes working memory to swiftly adapt to varying tasks, thereby enhancing both model and training efficiency. While DT-Mem already stands out for its generalization and adaptability, the potential for optimization persists. Future work could explore methods to further enhance sample efficiency and theoretically ground the advantages of supplementing foundation models with memory components.

Get summaries of trending AI papers delivered straight to your inbox

Unsubscribe anytime.

Test Your Knowledge

You answered out of questions correctly.

Well done!