- The paper introduces a reinforcement learning-based method to automatically design compression policies that outperform hand-crafted techniques.
- The methodology achieves a 2.7% accuracy boost for VGG-16 and up to 1.81x mobile inference speed improvements while reducing FLOPs.
- The study proposes two tailored compression protocols, enabling optimal trade-offs between computational resources and accuracy, with generalization to object detection tasks.
AutoML for Model Compression and Acceleration on Mobile Devices
The paper "AMC: AutoML for Model Compression and Acceleration on Mobile Devices" addresses the critical problem of deploying neural network models on mobile devices, which are constrained by computation resources and power budgets. Conventional model compression techniques rely heavily on human-crafted heuristics and domain expertise to explore the design space, balancing among model size, speed, and accuracy. These methods are typically suboptimal and time-consuming.
Overview of AMC
In this paper, the authors introduce AutoML for Model Compression (AMC), leveraging reinforcement learning to automate the design of model compression policies. AMC removes the need for manual intervention by employing a reinforcement learning agent to discover optimal compression strategies. This learning-based approach significantly outperforms the conventional rule-based methods in terms of compression ratio and accuracy preservation, all while reducing human labor.
The authors demonstrate the effectiveness of AMC by achieving a 2.7% higher accuracy for VGG-16 on ImageNet with a 4x FLOPs reduction compared to hand-crafted policies. On MobileNet, AMC achieves a 1.81x speedup in measured inference latency on an Android phone and a 1.43x speedup on the Titan XP GPU, with only a 0.1% loss in ImageNet Top-1 accuracy.
Methodology
Reinforcement Learning for Compression Policies
The core methodology of AMC includes formulating model compression as a reinforcement learning problem. The reinforcement learning agent processes a pre-trained neural network layer-by-layer and determines the sparsity ratio (compression level) for each layer. The agent uses an actor-critic method, specifically Deep Deterministic Policy Gradient (DDPG), to learn from trial and error. This method allows the agent to precisely control the compression ratio and optimize for both model size and accuracy.
Compression Protocols
The authors propose two compression protocols to cater to different application requirements:
- Resource-Constrained Compression: This protocol aims to achieve the best accuracy given a maximum amount of hardware resources (e.g., FLOPs, latency, model size). The action space (pruning ratio) is constrained to ensure the compressed model stays within resource limits.
- Accuracy-Guaranteed Compression: This protocol focuses on achieving the smallest model size without losing accuracy. The reward function balances both accuracy and hardware resource reduction, allowing the agent to explore the limits of compression.
Experimental Results
The paper showcases extensive experiments across various neural networks, including VGG, ResNet, and MobileNet, on both CIFAR-10 and ImageNet datasets.
CIFAR-10 Results
For CIFAR-10, the AMC significantly outperformed human-crafted policies for both shallow Plain-20 and deeper ResNet-56 networks. In a FLOPs-constrained compression scenario, AMC achieved higher validation and test accuracies compared to uniform, shallow, and deep pruning policies, demonstrating its effectiveness in finding optimal compression ratios.
ImageNet Results
On ImageNet, AMC pushed the limit of fine-grained pruning for ResNet-50, resulting in a 5x compression ratio without loss of performance. AMC also outperformed various state-of-the-art channel reduction methods, achieving better accuracy-computation trade-offs for VGG-16, MobileNet, and MobileNet-V2.
Mobile Inference Acceleration
A significant highlight of the paper is the practical implication of AMC in accelerating mobile inference. By directly optimizing the inference latency, AMC dramatically improved the performance of MobileNet on Google Pixel 1, achieving a near-double increase in speed without significant accuracy loss. This result underscores AMC's potential in facilitating efficient and effective neural network deployment on resource-constrained devices.
Generalization to Object Detection
AMC's generalization ability for tasks beyond classification was also evaluated. When applied to Faster R-CNN with a compressed VGG-16 backbone on the PASCAL VOC dataset, AMC demonstrated superior performance, achieving better mean average precision (mAP) compared to hand-crafted pruning methods. This robustness enhances the applicability of AMC across different neural network architectures and tasks, potentially broadening its impact.
Conclusion
AMC presents a significant step towards automating neural network model compression, leveraging reinforcement learning to surpass the limitations of heuristic-based methods. Its ability to provide both resource-constrained and accuracy-guaranteed compression makes it a versatile tool for various applications, particularly in mobile devices where resource constraints are stringent. The experimental results affirm AMC's superior performance and generalizability, heralding a more efficient approach to model compression and deployment in the field of deep learning.
AMC sets the stage for future developments where automated and learning-based methods become the standard in optimizing deep neural networks, pushing the boundaries of what can be achieved in AI with limited resources.