Introduction
The continuous evolution of reinforcement learning (RL) led to the success of LLMs as powerful decision-making agents. Their exceptional generalization capacities hinge on a key mechanism: the implicit memory, constituting neural network parameters that memorize vast datasets. Yet, this method's scalability is inhibited by its excessive reliance on data volume and computational resources.
Working Memory in Decision Making
Amidst efforts to mitigate these inefficiencies, the concept of "working memory" has been adopted from cognitive psychology. The Decision Transformers with Memory (DT-Mem) embody this approach, enabling them to actively store and process relevant past experiences. By separating skill-specific knowledge into an explicit memory structure, DT-Mem is designed for more efficient memory use, eschewing the confusion that could arise from implicit memory when dealing with similar yet distinct tasks.
Model Architecture and Method
DT-Mem introduces a distinct internal memory matrix consisting of two main processes: updating the memory with new information and retrieving from it for decision-making. With content-based addressing borrowed from prior neural network research, DT-Mem locates memory slots for updates or retrievals.
Furthermore, the system's architecture includes a Low-Rank Adaptation (LoRA) layer to fine-tune the memory when confronted with new tasks. Unlike full-model fine-tuning, which can be computationally taxing, this focused approach sharpens task-specific knowledge while leveraging a pre-trained model's broad understanding obtained from large datasets.
Evaluation and Contributions
When applied to Atari games and Meta-World environments, DT-Mem displayed promising training efficiency and generalization, outperforming models with substantially more parameters. The strength of DT-Mem lies in its adaptability; fine-tuning the working memory module with limited data still unlocked superior task adaptation.
In summary, DT-Mem makes two main contributions:
- It pioneers a novel Transformer architecture that integrates a memory module for improved generalization and computational efficiency.
- It introduces a LoRA-based fine-tuning method that bolsters adaptation to unseen tasks with less data reliance.
Conclusion and Outlook
The findings spotlight DT-Mem as a potent model that fine-tunes working memory to swiftly adapt to varying tasks, thereby enhancing both model and training efficiency. While DT-Mem already stands out for its generalization and adaptability, the potential for optimization persists. Future work could explore methods to further enhance sample efficiency and theoretically ground the advantages of supplementing foundation models with memory components.