Think Before You Act: Decision Transformers with Working Memory (2305.16338v3)

Published 24 May 2023 in cs.LG, cs.AI, and cs.CL

Abstract: Decision Transformer-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance relies on massive data and computation. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model's performance on previous tasks. In contrast to LLMs' implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Inspired by this, we propose a working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in Atari games and Meta-World object manipulation tasks. Moreover, we demonstrate that memory fine-tuning further enhances the adaptability of the proposed architecture.

Summary

The paper introduces DT-Mem, a model that integrates an explicit working memory module to enhance generalization and computational efficiency.
The paper employs a LoRA-based fine-tuning method that rapidly adapts task-specific knowledge with limited data.
The paper demonstrates superior performance in Atari and Meta-World tasks, outperforming larger models in training efficiency.

Introduction

The continuous evolution of reinforcement learning (RL) led to the success of LLMs as powerful decision-making agents. Their exceptional generalization capacities hinge on a key mechanism: the implicit memory, constituting neural network parameters that memorize vast datasets. Yet, this method's scalability is inhibited by its excessive reliance on data volume and computational resources.

Working Memory in Decision Making

Amidst efforts to mitigate these inefficiencies, the concept of "working memory" has been adopted from cognitive psychology. The Decision Transformers with Memory (DT-Mem) embody this approach, enabling them to actively store and process relevant past experiences. By separating skill-specific knowledge into an explicit memory structure, DT-Mem is designed for more efficient memory use, eschewing the confusion that could arise from implicit memory when dealing with similar yet distinct tasks.

Model Architecture and Method

DT-Mem introduces a distinct internal memory matrix consisting of two main processes: updating the memory with new information and retrieving from it for decision-making. With content-based addressing borrowed from prior neural network research, DT-Mem locates memory slots for updates or retrievals.

Furthermore, the system's architecture includes a Low-Rank Adaptation (LoRA) layer to fine-tune the memory when confronted with new tasks. Unlike full-model fine-tuning, which can be computationally taxing, this focused approach sharpens task-specific knowledge while leveraging a pre-trained model's broad understanding obtained from large datasets.

Evaluation and Contributions

When applied to Atari games and Meta-World environments, DT-Mem displayed promising training efficiency and generalization, outperforming models with substantially more parameters. The strength of DT-Mem lies in its adaptability; fine-tuning the working memory module with limited data still unlocked superior task adaptation.

In summary, DT-Mem makes two main contributions:

It pioneers a novel Transformer architecture that integrates a memory module for improved generalization and computational efficiency.
It introduces a LoRA-based fine-tuning method that bolsters adaptation to unseen tasks with less data reliance.

Conclusion and Outlook

The findings spotlight DT-Mem as a potent model that fine-tunes working memory to swiftly adapt to varying tasks, thereby enhancing both model and training efficiency. While DT-Mem already stands out for its generalization and adaptability, the potential for optimization persists. Future work could explore methods to further enhance sample efficiency and theoretically ground the advantages of supplementing foundation models with memory components.

PDF Markdown

Related Papers

YouTube

Show All Videos