Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates (1907.03141v2)

Published 6 Jul 2019 in cs.LG, cs.AI, cs.CV, cs.NE, and stat.ML

Abstract: Structured weight pruning is a representative model compression technique of DNNs to reduce the storage and computation requirements and accelerate inference. An automatic hyperparameter determination process is necessary due to the large number of flexible hyperparameters. This work proposes AutoCompress, an automatic structured pruning framework with the following key performance improvements: (i) effectively incorporate the combination of structured pruning schemes in the automatic process; (ii) adopt the state-of-art ADMM-based structured weight pruning as the core algorithm, and propose an innovative additional purification step for further weight reduction without accuracy loss; and (iii) develop effective heuristic search method enhanced by experience-based guided search, replacing the prior deep reinforcement learning technique which has underlying incompatibility with the target pruning problem. Extensive experiments on CIFAR-10 and ImageNet datasets demonstrate that AutoCompress is the key to achieve ultra-high pruning rates on the number of weights and FLOPs that cannot be achieved before. As an example, AutoCompress outperforms the prior work on automatic model compression by up to 33x in pruning rate (120x reduction in the actual parameter count) under the same accuracy. Significant inference speedup has been observed from the AutoCompress framework on actual measurements on smartphone. We release all models of this work at anonymous link: http://bit.ly/2VZ63dS.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ning Liu (199 papers)
  2. Xiaolong Ma (57 papers)
  3. Zhiyuan Xu (47 papers)
  4. Yanzhi Wang (197 papers)
  5. Jian Tang (327 papers)
  6. Jieping Ye (169 papers)
Citations (167)

Summary

AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates

Deep neural networks (DNNs) have rapidly advanced, pushing forward capabilities in numerous fields. Despite their remarkable success, the extensive computational and storage demands of large-scale DNN models, such as VGG and ResNet, pose challenges for efficient deployment, particularly on resource-constrained devices. The paper presents AutoCompress, a novel framework aimed at addressing these challenges through automatic structured pruning to achieve ultra-high compression rates without sacrificing accuracy.

Structured weight pruning stands out as a method to effectively compress models while maintaining their architectural integrity, favorable for hardware acceleration. The innovation of AutoCompress lies in its automated hyperparameter determination process, addressing previous challenges in manually crafting these parameters, which typically results in sub-optimal configurations.

Key Contributions

The paper outlines several pivotal contributions that distinguish AutoCompress from previous methodologies:

  1. Integration of Structured Pruning Schemes: AutoCompress skillfully combines different structured pruning schemes, notably filter and column pruning, resulting in superior parameter reduction compared to filter-only methods.
  2. ADMM-based Pruning: The framework employs ADMM (Alternating Direction Methods of Multipliers) for structured weight pruning. This state-of-art optimization tool, combined with an innovative purification step, significantly reduces model size without diminishing accuracy. The ADMM-based approach surpasses traditional fixed regularization methods commonly used in pruning.
  3. Enhanced Heuristic Search: AutoCompress replaces typical deep reinforcement learning (DRL) techniques with an enhanced heuristic search, specifically a simulated annealing strategy bolstered by experience-based guidance. This mitigates the incompatibility issues found in DRL setups when tackling high pruning rates, proving more robust across varied DNN architectures.

Experimental Validation

AutoCompress is rigorously evaluated across CIFAR-10 and ImageNet datasets using models such as VGG-16 and ResNet-18/50. The results demonstrate substantial improvements:

  • Pruning rates are increased up to 33×\times compared to prior automatic compression approaches, with actual parameter reduction reaching 120×\times in scenarios demanding the same level of accuracy.
  • The framework produces significant inference speedup on actual hardware, validating its practical applicability for accelerating DNN inference.

The paper contrasts structured versus non-structured pruning, indicating that while non-structured pruning achieves commendable compression ratios, structured pruning under AutoCompress offers streamlined execution on hardware with enhanced consistency.

Implications and Future Applications

The AutoCompress framework is a substantial leap forward for model compression, with implications reaching into mobile computing, embedded systems, and distributed network applications. By reducing the computational footprint of models, AutoCompress enables the deployment of complex DNNs on devices that were previously incapable of handling them.

The theoretical implications suggest avenues for further explorations into automated machine learning frameworks, emphasizing heuristic and optimization techniques beyond reinforcement learning. The potential for evolving approaches that integrate human-guided search strategies could unlock more adaptable pruning frameworks suited for diverse architectures and application needs.

In terms of future developments, AutoCompress constitutes a promising foundation for expanding model compression techniques within the context of automated framework design. Its ability to reconcile high compression rates with parallel hardware execution efficiency indicates fertile ground for advancements in on-the-edge artificial intelligence applications.

Through this structured pruning framework, researchers and practitioners can explore more efficient deployments of AI solutions, keeping pace with the expanding capabilities of neural network models while remaining mindful of practical constraints and device potentials.