Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 95 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 18 tok/s Pro
GPT-4o 95 tok/s Pro
GPT OSS 120B 391 tok/s Pro
Kimi K2 159 tok/s Pro
2000 character limit reached

Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping (2507.16249v1)

Published 22 Jul 2025 in cs.LG and cs.MA

Abstract: Mapping deep neural networks (DNNs) to hardware is critical for optimizing latency, energy consumption, and resource utilization, making it a cornerstone of high-performance accelerator design. Due to the vast and complex mapping space, reinforcement learning (RL) has emerged as a promising approach-but its effectiveness is often limited by sample inefficiency. We present a decentralized multi-agent reinforcement learning (MARL) framework designed to overcome this challenge. By distributing the search across multiple agents, our framework accelerates exploration. To avoid inefficiencies from training multiple agents in parallel, we introduce an agent clustering algorithm that assigns similar mapping parameters to the same agents based on correlation analysis. This enables a decentralized, parallelized learning process that significantly improves sample efficiency. Experimental results show our MARL approach improves sample efficiency by 30-300x over standard single-agent RL, achieving up to 32.61x latency reduction and 16.45x energy-delay product (EDP) reduction under iso-sample conditions.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper presents a decentralized MARL framework that drastically improves sample efficiency over single-agent approaches.
  • It employs a correlation-based clustering algorithm to group interrelated parameters, reducing computational costs while optimizing mapping decisions.
  • Empirical results on CNNs like MobileNet-v2 and VGG16 demonstrate significant improvements in latency, energy-delay product, and area utilization.

Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping

This paper introduces a decentralized multi-agent reinforcement learning (MARL) framework tailored to the DNN-to-hardware mapping problem, emphasizing sample efficiency and scalability to high-dimensional combinatorial search spaces. The paper targets the challenge of optimizing DNN layer mappings for hardware accelerators to minimize latency, energy-delay product (EDP), and area utilization. It leverages a constructive reinterpretation of mapping decisions as an interconnected multiagent control problem, demonstrating significant advantages over single-agent RL and other heuristic and ML approaches.

Problem Formulation and Limits of Prior Methods

Mapping DNNs onto accelerators involves exploring configurations for tensor tiling, loop ordering, and parallelization, yielding search spaces ranging from 10410^4 to 103910^{39} per layer. Previous strategies (random/grid search, Bayesian optimization, genetic algorithms, single-agent RL) face pronounced trade-offs:

  • Brute-force/randomized methods: Prohibitively sample-inefficient for large-scale mappings.
  • Evolutionary algorithms and Bayesian optimization: More sample-efficient than brute-force, but less capable than RL in high-dimensional contexts.
  • Single-agent RL: Theoretically powerful but suffers from severe sample inefficiency as dimensionality rises due to combinatorial explosion in parameter space.

This establishes the necessity for a method that combines both scalable search capability and high sample efficiency.

MARL Framework and Parameter Clustering

The framework models each mapping parameter as being controlled by a distinct RL agent. Agents receive a shared global reward but act independently, allowing for structured and parallel exploration. However, full decentralization inflates computational resource requirements linearly with parameter count. To mitigate this, the authors introduce a correlation-based agent clustering algorithm. This procedure:

  • Collects a dataset of \langlemapping parameters, reward\rangle pairs via initial exploration (using policies such as random search, BO, GA, single-agent RL).
  • Computes pairwise correlations among parameters conditional on the target objective.
  • Performs agglomerative clustering on the correlation matrix, assigning highly correlated parameters to the same agent and independent parameters to distinct agents, optimizing the assignment for a fixed agent budget BB.
  • Deploys each agent (or cluster) to optimize its assigned group of parameters within the MARL setup.

This factorization supports hardware and application-specific partial decentralization, balancing coordination burden and sample efficiency.

Algorithmic Workflow (in summary)

1
2
3
4
5
L = [r, s, k, c, h, w]  # DNN layer parameters
D = collect_data(L, policy=pi, num_samples=20000)
M = compute_correlation_matrix(D)
clusters = agglomerative_clustering(M, n_clusters=B)
agent_assignments = assign_parameters_to_agents(L, clusters)

Experimental Setup and Metrics

The experimental evaluation targets CNNs—specifically MobileNet-v2 and VGG16—due to their complex mapping spaces. The mapping environment is built atop Maestro, extended using a Gym-compatible interface. The reward is the inverse of the optimization target (latency, EDP, area). All baseline algorithms are run under an equal sample budget for direct comparison.

Empirical Results: Sample Efficiency and Quality

The MARL approach achieves dominant results in both sample efficiency and final solution quality across latency, area, and EDP metrics:

  • Sample efficiency: MARL exhibits $30$–300×300\times faster convergence over single-agent RL, and is consistently faster than GA, BO, and random/search baselines.
  • Latency reduction: Achieves up to 32.61×32.61\times lower latency on VGG16 layers relative to single-agent RL; outperforms all others under equal sample budgets.
  • EDP: Observed up to 16.45×16.45\times improvement for VGG16 EDP, again with strong gains across other architectures.
  • Ablation studies: Varying agent budgets from single-agent, clustered-agent (e.g., 2, 6, 9 agents), to fully decentralized (10 agents) demonstrate that even partial decentralization yields significant gains, with diminishing returns as decentralization increases.
  • Temporal parameter tracking: MARL achieves convergence to near-optimal configurations orders of magnitude faster than single-agent RL, and typically matches or exceeds its final performance.

Theoretical and Practical Implications

Parameter independence and clustering: Empirical correlation analyses reveal that many mapping parameters are nearly independent, supporting the rationale for parallel, agent-wise optimization. Where strong correlations exist, the clustering algorithm adapts agent assignment accordingly. This adaptive decentralization both reduces redundant exploration and guides hardware-aware agent mapping.

Resource and deployment considerations:

  • The up-front cost for cluster analysis and data curation (~20,000 samples) is minor relative to the sample savings achieved by MARL, even more so when amortized over multiple mapping scenarios or iterative hardware design cycles.
  • In practice, the framework aligns with design-time optimization, enabling rapid mapping of new DNN models with limited resources.

Scalability and transfer: The clustering-based factorization generalizes across DNN architectures (MLPs, CNNs, Transformers) and is robust to scaling in both network size and hardware configuration. The approach supports plug-and-play integration with Gym-based RL toolchains and simulation environments.

Broader Impact and Future Directions

The paper establishes a concrete methodological advance in the use of multi-agent reinforcement learning for extremely high-dimensional, combinatorial mapping problems. By augmenting MARL with agent clustering informed by empirical parameter correlations, the approach substantially enhances the tractability of real-world DNN accelerator mapping.

Open directions include:

  • Generalizing correlation clustering to higher-order parameter interactions and multi-objective reward structures.
  • Integrating the MARL+clustering approach with differentiable accelerator simulators and co-design flows.
  • Exploiting the insights from inter-parameter dependencies for automatic hardware architecture augmentation.
  • Applying MARL-based mapping policies for runtime or continual adaptation in reconfigurable accelerator systems.

The demonstrated improvements in sample efficiency directly translate to reduced computation and simulation cost, and by extension, accelerated research and development in DNN accelerator design and deployment. This work provides both a practical toolkit and a theoretical motivation for reinforcement-learned co-design at the intersection of AI and systems.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube