AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates
Deep neural networks (DNNs) have rapidly advanced, pushing forward capabilities in numerous fields. Despite their remarkable success, the extensive computational and storage demands of large-scale DNN models, such as VGG and ResNet, pose challenges for efficient deployment, particularly on resource-constrained devices. The paper presents AutoCompress, a novel framework aimed at addressing these challenges through automatic structured pruning to achieve ultra-high compression rates without sacrificing accuracy.
Structured weight pruning stands out as a method to effectively compress models while maintaining their architectural integrity, favorable for hardware acceleration. The innovation of AutoCompress lies in its automated hyperparameter determination process, addressing previous challenges in manually crafting these parameters, which typically results in sub-optimal configurations.
Key Contributions
The paper outlines several pivotal contributions that distinguish AutoCompress from previous methodologies:
- Integration of Structured Pruning Schemes: AutoCompress skillfully combines different structured pruning schemes, notably filter and column pruning, resulting in superior parameter reduction compared to filter-only methods.
- ADMM-based Pruning: The framework employs ADMM (Alternating Direction Methods of Multipliers) for structured weight pruning. This state-of-art optimization tool, combined with an innovative purification step, significantly reduces model size without diminishing accuracy. The ADMM-based approach surpasses traditional fixed regularization methods commonly used in pruning.
- Enhanced Heuristic Search: AutoCompress replaces typical deep reinforcement learning (DRL) techniques with an enhanced heuristic search, specifically a simulated annealing strategy bolstered by experience-based guidance. This mitigates the incompatibility issues found in DRL setups when tackling high pruning rates, proving more robust across varied DNN architectures.
Experimental Validation
AutoCompress is rigorously evaluated across CIFAR-10 and ImageNet datasets using models such as VGG-16 and ResNet-18/50. The results demonstrate substantial improvements:
- Pruning rates are increased up to 33× compared to prior automatic compression approaches, with actual parameter reduction reaching 120× in scenarios demanding the same level of accuracy.
- The framework produces significant inference speedup on actual hardware, validating its practical applicability for accelerating DNN inference.
The paper contrasts structured versus non-structured pruning, indicating that while non-structured pruning achieves commendable compression ratios, structured pruning under AutoCompress offers streamlined execution on hardware with enhanced consistency.
Implications and Future Applications
The AutoCompress framework is a substantial leap forward for model compression, with implications reaching into mobile computing, embedded systems, and distributed network applications. By reducing the computational footprint of models, AutoCompress enables the deployment of complex DNNs on devices that were previously incapable of handling them.
The theoretical implications suggest avenues for further explorations into automated machine learning frameworks, emphasizing heuristic and optimization techniques beyond reinforcement learning. The potential for evolving approaches that integrate human-guided search strategies could unlock more adaptable pruning frameworks suited for diverse architectures and application needs.
In terms of future developments, AutoCompress constitutes a promising foundation for expanding model compression techniques within the context of automated framework design. Its ability to reconcile high compression rates with parallel hardware execution efficiency indicates fertile ground for advancements in on-the-edge artificial intelligence applications.
Through this structured pruning framework, researchers and practitioners can explore more efficient deployments of AI solutions, keeping pace with the expanding capabilities of neural network models while remaining mindful of practical constraints and device potentials.