SRMT: Shared Memory for Multi-agent Lifelong Pathfinding (2501.13200v1)

Published 22 Jan 2025 in cs.LG, cs.AI, and cs.MA

Abstract: Multi-agent reinforcement learning (MARL) demonstrates significant progress in solving cooperative and competitive multi-agent problems in various environments. One of the principal challenges in MARL is the need for explicit prediction of the agents' behavior to achieve cooperation. To resolve this issue, we propose the Shared Recurrent Memory Transformer (SRMT) which extends memory transformers to multi-agent settings by pooling and globally broadcasting individual working memories, enabling agents to exchange information implicitly and coordinate their actions. We evaluate SRMT on the Partially Observable Multi-Agent Pathfinding problem in a toy Bottleneck navigation task that requires agents to pass through a narrow corridor and on a POGEMA benchmark set of tasks. In the Bottleneck task, SRMT consistently outperforms a variety of reinforcement learning baselines, especially under sparse rewards, and generalizes effectively to longer corridors than those seen during training. On POGEMA maps, including Mazes, Random, and MovingAI, SRMT is competitive with recent MARL, hybrid, and planning-based algorithms. These results suggest that incorporating shared recurrent memory into the transformer-based architectures can enhance coordination in decentralized multi-agent systems. The source code for training and evaluation is available on GitHub: https://github.com/Aloriosa/srmt.

Summary

The paper presents SRMT, a novel architecture that uses shared recurrent memory to achieve efficient coordination among agents in partially observable environments.
It employs a decentralized, cross-attention mechanism that aggregates individual agent memories into a global workspace for improved decision-making.
Experimental results demonstrate that SRMT outperforms state-of-the-art MARL methods in scalability, sparse rewards, and overall coordination effectiveness.

Analysis of the SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

The paper "SRMT: Shared Memory for Multi-agent Lifelong Pathfinding" introduces a novel method for enhancing coordination among agents in multi-agent reinforcement learning (MARL) environments. The paper addresses one of the significant challenges in MARL: achieving efficient coordination and information exchange among agents without requiring explicit communication.

The authors propose the Shared Recurrent Memory Transformer (SRMT), an architecture that extends the capabilities of memory transformers to multi-agent contexts by implementing a shared memory system. This mechanism allows individual agents' memories to be pooled and globally broadcasted, facilitating implicit information sharing and coordination. The SRMT is applied to Partially Observable Multi-Agent Pathfinding (PO-MAPF) problems, which are characterized by agents needing to reach their goals while having limited visibility of the environment and observing obstacles and other agents only locally.

Core Methodology and Results

The SRMT utilizes a decentralized approach to updating and sharing memory among agents, inspired by the global workspace theory. The architecture heavily relies on attention mechanisms, particularly leveraging a recurrent memory transformer with cross-attention layers. This configuration allows each agent to maintain a personal memory updated at each step, while simultaneously interacting with a shared global memory representing the combined state of all agents.

In empirical evaluations, the SRMT was rigorously tested against established MARL algorithms and novel approaches like QPLEX, MAMBA, and ATM, among others. The experimental benchmark was established using a toy task in a Bottleneck environment and extended evaluations on POGEMA benchmark maps. The results illustrate several compelling findings:

Performance Under Sparse Rewards: SRMT demonstrates superior performance in environments where only sparse reward signals are available. The shared memory capability provides agents with implicit global state information, which is crucial for decision-making in the absence of immediate rewards.
Scalability and Generalization: The proposed model shows excellent scalability and generalization across tasks of varying corridor lengths in the Bottleneck scenario, proving effective even in untrained and extended environments.
Comparative Analysis: Against contemporary models, SRMT consistently either matches or outperforms its counterparts in key metrics like Cooperative Success Rate (CSR), Individual Success Rate (ISR), and Sum-of-Costs (SoC), especially when tasked with handling sparse or negative reward settings.

Implications and Future Directions

The SRMT's decentralized networked agent approach presents significant implications for MARL applications, particularly in dynamic and fragmented environments where a centralized controller is infeasible. The model’s reliance on shared memory allows for a flexible exchange of information, potentially opening new pathways for multi-agent systems where agents can adapt to unforeseen scenarios by leveraging collective knowledge.

Future research could explore enhancements to memory initialization, as comparisons indicated that starting conditions of the memory state can significantly impact performance outcomes. Furthermore, integrating planning methods with SRMT, as evidenced in Warehouse tests, may offer further improvements in environments characterized by high congestion and coordination complexity.

In summary, the Shared Recurrent Memory Transformer establishes a robust framework within MARL, addressing challenges of coordination and efficient information sharing. The implications of SRMT extend beyond traditional pathfinding, offering a scalable, decentralized solution applicable to a variety of problem domains where explicit communication may be constrained. The paper lays a promising foundation for subsequent advancements in memory-augmented agent-based systems, particularly enhancing their adaptability and cooperative capabilities in complex tasks.