- The paper introduces a novel RL agent that integrates short-term, episodic, and semantic memory systems using knowledge graphs.
- It employs a deep Q-learning framework and LSTM networks to decide whether observations are stored or discarded, enhancing learning efficiency.
- Experimental results demonstrate that prefilled semantic memory significantly improves performance in the simulated 'Room' environment.
A Machine with Short-Term, Episodic, and Semantic Memory Systems
Introduction
The paper "A Machine with Short-Term, Episodic, and Semantic Memory Systems" (2212.02098) explores the integration of cognitive science insights into the construction of artificial reinforcement learning (RL) agents. By emulating human-like memory systems—specifically, short-term, episodic, and semantic memories—the researchers aim to improve question-answering capabilities of machines in dynamic environments. The proposed model encapsulates these memory systems within knowledge graphs to facilitate efficient storage and retrieval processes.
Methodology
The architecture of the proposed agent comprises three distinct memory systems, each represented as a knowledge graph (Figure 1):
Figure 1: The memory systems of the agent. The long-term (explicit) memory systems consist of episodic and semantic memory systems.
- Short-Term Memory: Holds recent observations and is bounded in capacity. Decisions regarding the storage of these observations into long-term memory systems (either episodic or semantic) or discarding them entirely are learned behaviors.
- Episodic Memory: Stores time-bound, individual-specific events that allow the agent to mimic human personal experiences. This memory is crucial for reconstructing specific event sequences.
- Semantic Memory: Houses general world knowledge and commonly accepted facts. This system abstracts information into more generalized, entity-based knowledge, lacking temporal specifics.
The memory retrieval process employs decision rules that prioritize the recency of episodes and the strength of semantic memories, facilitating efficient retrieval for answering environment-related questions (Algorithm 1).

Figure 2: An episodic memory and a semantic memory represented as a knowledge graph.
Reinforcement Learning Model
The agent's decision-making is powered by a deep Q-learning framework. The agent learns to decide, through interactions with a simulation environment called "the Room," whether an observation should be stored or forgotten. Each observation is represented as a quadruple (head, relation, tail, timestamp), translating the memory systems into sequences of embeddings processed by LSTM networks to inform decisions (Figure 3).
Figure 3: The Q-network diagram, where the short-term Mo​, episodic Me​, and semantic Ms​ memory systems are given as the initial input.
Experimental Setup and Results
The simulation environment, "the Room," features a discrete-event system with multiple humans and objects, providing a varied and challenging setting for testing the agent's memory capabilities. The experiments demonstrate that the RL agent with human-like memory systems outperforms those lacking such structured memory, achieving high accuracy in answering both individual-specific and generalized questions.
Numerical results (Figure 4) indicate that the RL agent, when initialized with a prefilled semantic memory resembling prior world knowledge, learns more effectively and performs better in test environments compared to those starting from an empty semantic state.
Figure 4: Training, validation, and test results of the agents with the memory capacity of 32.
Discussion
The introduction of a dual-memory model—episodic and semantic—results in a nuanced memory retrieval mechanism that parallels human cognitive processes. While episodic memories support context-specific recall, semantic memories underpin generalized knowledge, allowing for adaptability across varied scenarios. This dual system effectively addresses partial observability issues inherent in many RL problems.
Conclusion
By integrating cognitive theories into algorithmic frameworks, the paper introduces an innovative RL architecture with distinct memory systems, leading to enhanced learning and problem-solving capabilities. Future research directions include extending the complexity of the environment, incorporating multimodal inputs, and employing other human-like memory representations to further emulate human cognitive functions. Continued exploration in this field may yield more robust and adaptable AI systems capable of handling a wider array of real-world applications.