Dynamic Co-Optimization Compiler: Leveraging Multi-Agent Reinforcement Learning for Enhanced DNN Accelerator Performance

Published 11 Jul 2024 in cs.LG, cs.AI, and cs.AR | (2407.08192v3)

Abstract: This paper introduces a novel Dynamic Co-Optimization Compiler (DCOC), which employs an adaptive Multi-Agent Reinforcement Learning (MARL) framework to enhance the efficiency of mapping ML models, particularly Deep Neural Networks (DNNs), onto diverse hardware platforms. DCOC incorporates three specialized actor-critic agents within MARL, each dedicated to different optimization facets: one for hardware and two for software. This cooperative strategy results in an integrated hardware/software co-optimization approach, improving the precision and speed of DNN deployments. By focusing on high-confidence configurations, DCOC effectively reduces the search space, achieving remarkable performance over existing methods. Our results demonstrate that DCOC enhances throughput by up to 37.95% while reducing optimization time by up to 42.2% across various DNN models, outperforming current state-of-the-art frameworks.

Abstract PDF HTML Upgrade to Chat

Citations (2)

View on Semantic Scholar

Summary

The paper introduces ARCO, a dynamic co-optimization compiler using multi-agent reinforcement learning to optimize both software and hardware configurations for DNNs.
It employs a centralized training with decentralized execution approach and a confidence sampling method to improve throughput by up to 37.95% and reduce optimization time by up to 42.2%.
Experimental evaluations on models like ResNet-18 and VGG demonstrate ARCO’s superior performance compared to traditional frameworks such as AutoTVM and CHAMELEON.

Dynamic Co-Optimization Compiler: Leveraging Multi-Agent Reinforcement Learning for Enhanced DNN Accelerator Performance

Introduction

The "Dynamic Co-Optimization Compiler" paper introduces ARCO, an innovative framework based on Multi-Agent Reinforcement Learning (MARL) for optimizing the deployment of Deep Neural Networks (DNNs) on various hardware platforms. ARCO aims to enhance throughput and reduce optimization time by integrating multi-agent systems that focus on concurrent optimization of hardware and software configurations. It represents a significant step in the evolution of automated compilation frameworks.

Framework Overview

ARCO distinguishes itself by employing three MARL agents operating using an actor-critic model, with roles split between software and hardware optimization. This setup allows for fine-tuned adjustments across different aspects of the DNN deployment, translating directly to improved efficiency and reduced computational overhead. The framework exploits high-confidence configurations, effectively expanding the operational space (Figure 1).

Figure 1: Overall search flow of ARCO.

The agents in ARCO operate under a Centralized Training with Decentralized Execution (CTDE) paradigm. This paradigm ensures that, during training, all agents have access to a global state, while decisions during execution rely on local observations and policies refined through shared learning experiences.

Architecture and Implementation

Key components of ARCO include the MARL Exploration Module and the Confidence Sampling (CS) method:

MARL Exploration Module: Comprising three specialized agents, this module optimizes DNNs by simultaneously fine-tuning software and hardware parameters. The agents are guided by a centralized critic that informs policy updates during training (Figure 2).
Figure 2: High-level view of MARL Exploration Module. Each Agent has a policy network and, based on the centralized critic feedback, it will do an action in its own environment.
Confidence Sampling Method: CS enhances the efficiency of configuration selection by reducing the number of necessary hardware measurements. It employs a probability-guided selection process to prioritize configurations most likely to improve performance, informed by empirical successes.

Figure 3: Configurations over time for ResNet-18 model before the CS method.

Experimental Evaluation

ARCO's effectiveness was validated through an extensive series of experiments against existing frameworks such as AutoTVM and CHAMELEON. The results demonstrated a marked enhancement in throughput and a reduction in optimization time, up to 37.95% and 42.2%, respectively. These improvements underscore the efficacy of the MARL and CS approach in real-world deployments, achieving superior throughput across models like ResNet-18 and VGG series.

Figure 4: Comparing the achieved throughput of different frameworks over AutoTVM on VTA++.

Conclusion

The paper showcases ARCO as a significant advance in DNN compiler frameworks, addressing key challenges in hardware-software co-optimization. By leveraging MARL, ARCO not only optimizes the deployment of DNNs across different hardware platforms but also sets a benchmark for future developments in adaptive, intelligent mapping systems. Further research could enhance these techniques, possibly integrating hybrid models or exploring more complex feedback mechanisms to continue refining optimization strategies.

Markdown