- The paper presents a decentralized MARL framework that drastically improves sample efficiency over single-agent approaches.
- It employs a correlation-based clustering algorithm to group interrelated parameters, reducing computational costs while optimizing mapping decisions.
- Empirical results on CNNs like MobileNet-v2 and VGG16 demonstrate significant improvements in latency, energy-delay product, and area utilization.
Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping
This paper introduces a decentralized multi-agent reinforcement learning (MARL) framework tailored to the DNN-to-hardware mapping problem, emphasizing sample efficiency and scalability to high-dimensional combinatorial search spaces. The paper targets the challenge of optimizing DNN layer mappings for hardware accelerators to minimize latency, energy-delay product (EDP), and area utilization. It leverages a constructive reinterpretation of mapping decisions as an interconnected multiagent control problem, demonstrating significant advantages over single-agent RL and other heuristic and ML approaches.
Mapping DNNs onto accelerators involves exploring configurations for tensor tiling, loop ordering, and parallelization, yielding search spaces ranging from 104 to 1039 per layer. Previous strategies (random/grid search, Bayesian optimization, genetic algorithms, single-agent RL) face pronounced trade-offs:
- Brute-force/randomized methods: Prohibitively sample-inefficient for large-scale mappings.
- Evolutionary algorithms and Bayesian optimization: More sample-efficient than brute-force, but less capable than RL in high-dimensional contexts.
- Single-agent RL: Theoretically powerful but suffers from severe sample inefficiency as dimensionality rises due to combinatorial explosion in parameter space.
This establishes the necessity for a method that combines both scalable search capability and high sample efficiency.
MARL Framework and Parameter Clustering
The framework models each mapping parameter as being controlled by a distinct RL agent. Agents receive a shared global reward but act independently, allowing for structured and parallel exploration. However, full decentralization inflates computational resource requirements linearly with parameter count. To mitigate this, the authors introduce a correlation-based agent clustering algorithm. This procedure:
- Collects a dataset of ⟨mapping parameters, reward⟩ pairs via initial exploration (using policies such as random search, BO, GA, single-agent RL).
- Computes pairwise correlations among parameters conditional on the target objective.
- Performs agglomerative clustering on the correlation matrix, assigning highly correlated parameters to the same agent and independent parameters to distinct agents, optimizing the assignment for a fixed agent budget B.
- Deploys each agent (or cluster) to optimize its assigned group of parameters within the MARL setup.
This factorization supports hardware and application-specific partial decentralization, balancing coordination burden and sample efficiency.
Algorithmic Workflow (in summary)
1
2
3
4
5
|
L = [r, s, k, c, h, w] # DNN layer parameters
D = collect_data(L, policy=pi, num_samples=20000)
M = compute_correlation_matrix(D)
clusters = agglomerative_clustering(M, n_clusters=B)
agent_assignments = assign_parameters_to_agents(L, clusters) |
Experimental Setup and Metrics
The experimental evaluation targets CNNs—specifically MobileNet-v2 and VGG16—due to their complex mapping spaces. The mapping environment is built atop Maestro, extended using a Gym-compatible interface. The reward is the inverse of the optimization target (latency, EDP, area). All baseline algorithms are run under an equal sample budget for direct comparison.
Empirical Results: Sample Efficiency and Quality
The MARL approach achieves dominant results in both sample efficiency and final solution quality across latency, area, and EDP metrics:
- Sample efficiency: MARL exhibits $30$–300× faster convergence over single-agent RL, and is consistently faster than GA, BO, and random/search baselines.
- Latency reduction: Achieves up to 32.61× lower latency on VGG16 layers relative to single-agent RL; outperforms all others under equal sample budgets.
- EDP: Observed up to 16.45× improvement for VGG16 EDP, again with strong gains across other architectures.
- Ablation studies: Varying agent budgets from single-agent, clustered-agent (e.g., 2, 6, 9 agents), to fully decentralized (10 agents) demonstrate that even partial decentralization yields significant gains, with diminishing returns as decentralization increases.
- Temporal parameter tracking: MARL achieves convergence to near-optimal configurations orders of magnitude faster than single-agent RL, and typically matches or exceeds its final performance.
Theoretical and Practical Implications
Parameter independence and clustering: Empirical correlation analyses reveal that many mapping parameters are nearly independent, supporting the rationale for parallel, agent-wise optimization. Where strong correlations exist, the clustering algorithm adapts agent assignment accordingly. This adaptive decentralization both reduces redundant exploration and guides hardware-aware agent mapping.
Resource and deployment considerations:
- The up-front cost for cluster analysis and data curation (~20,000 samples) is minor relative to the sample savings achieved by MARL, even more so when amortized over multiple mapping scenarios or iterative hardware design cycles.
- In practice, the framework aligns with design-time optimization, enabling rapid mapping of new DNN models with limited resources.
Scalability and transfer: The clustering-based factorization generalizes across DNN architectures (MLPs, CNNs, Transformers) and is robust to scaling in both network size and hardware configuration. The approach supports plug-and-play integration with Gym-based RL toolchains and simulation environments.
Broader Impact and Future Directions
The paper establishes a concrete methodological advance in the use of multi-agent reinforcement learning for extremely high-dimensional, combinatorial mapping problems. By augmenting MARL with agent clustering informed by empirical parameter correlations, the approach substantially enhances the tractability of real-world DNN accelerator mapping.
Open directions include:
- Generalizing correlation clustering to higher-order parameter interactions and multi-objective reward structures.
- Integrating the MARL+clustering approach with differentiable accelerator simulators and co-design flows.
- Exploiting the insights from inter-parameter dependencies for automatic hardware architecture augmentation.
- Applying MARL-based mapping policies for runtime or continual adaptation in reconfigurable accelerator systems.
The demonstrated improvements in sample efficiency directly translate to reduced computation and simulation cost, and by extension, accelerated research and development in DNN accelerator design and deployment. This work provides both a practical toolkit and a theoretical motivation for reinforcement-learned co-design at the intersection of AI and systems.