Network Pruning via Transformable Architecture Search (1905.09717v5)

Published 23 May 2019 in cs.CV

Abstract: Network pruning reduces the computation costs of an over-parameterized network without performance damage. Prevailing pruning algorithms pre-define the width and depth of the pruned networks, and then transfer parameters from the unpruned network to pruned networks. To break the structure limitation of the pruned networks, we propose to apply neural architecture search to search directly for a network with flexible channel and layer sizes. The number of the channels/layers is learned by minimizing the loss of the pruned networks. The feature map of the pruned network is an aggregation of K feature map fragments (generated by K networks of different sizes), which are sampled based on the probability distribution.The loss can be back-propagated not only to the network weights, but also to the parameterized distribution to explicitly tune the size of the channels/layers. Specifically, we apply channel-wise interpolation to keep the feature map with different channel sizes aligned in the aggregation procedure. The maximum probability for the size in each distribution serves as the width and depth of the pruned network, whose parameters are learned by knowledge transfer, e.g., knowledge distillation, from the original networks. Experiments on CIFAR-10, CIFAR-100 and ImageNet demonstrate the effectiveness of our new perspective of network pruning compared to traditional network pruning algorithms. Various searching and knowledge transfer approaches are conducted to show the effectiveness of the two components. Code is at: https://github.com/D-X-Y/NAS-Projects.

PDF Abstract

Network Pruning via Transformable Architecture Search: An Expert Overview

The paper "Network Pruning via Transformable Architecture Search" by Xuanyi Dong and Yi Yang introduces a novel approach to network pruning using Neural Architecture Search (NAS), specifically focusing on enhancing both the efficiency and effectiveness of pruning over-parameterized convolutional neural networks (CNNs). The proposed methodology is significant for its potential to reduce computational costs without sacrificing performance, which is crucial for deploying deep learning models on resource-constrained devices.

Methodology and Approach

The central innovation of this work is the application of NAS to optimize the size of networks with variable channel and layer configurations, termed Transformable Architecture Search (TAS). This is a departure from traditional pruning algorithms that rely on pre-defined network structures and transfer learning from larger, unpruned models.

Transformable Architecture Search (TAS)

TAS employs a differential optimization strategy to dynamically select the number of channels and layers. The process involves:

Learning a probability distribution for the number of channels and layers, thereby eschewing the rigid hand-crafted structural limitations typical of many pruning methods.
Using channel-wise interpolation to align varying feature map sizes for aggregation purposes, allowing for seamless integration of features across different configurations.
The employment of Gumbel-Softmax sampling to make architecture decisions differentiable, thus enabling gradient-based optimization.

Knowledge Transfer

Following the architectural search, parameters of the pruned network are optimized through knowledge transfer methods like knowledge distillation (KD), which effectively enables the transfer of learned behaviors from expansive networks to more compact versions. This ensures that the reduced model retains performance characteristics of the original, unpruned architecture.

Experimental Results

The effectiveness of the TAS approach is validated through a series of experiments on standard datasets such as CIFAR-10, CIFAR-100, and ImageNet. The results demonstrate that this method not only achieves significant reduction in FLOPs—up to 50%—but also maintains, or even enhances, model accuracy compared to state-of-the-art pruning techniques. For instance, the architecture derived from TAS achieved superior performance on CIFAR-100 and ImageNet, showcasing its versatility across different models and tasks.

Implications and Future Directions

The integration of NAS into the pruning process opens new avenues for automating and optimizing neural network designs beyond conventional manual approaches. By focusing on both network depth and width, the TAS method allows for tailored resource allocation, leading to models better suited for specific deployment environments.

Looking forward, the implications of this research extend to several practical and theoretical domains:

Practical Deployment: This approach holds promise for enhancing model deployment on hardware with limited computational capacity, such as mobile and embedded systems.
AI Model Design: TAS contributes to the ongoing evolution of AI model design automation, potentially impacting future research in NAS and model compression techniques.
Efficient Training: Given the computational demand of NAS, further research might explore more efficient search algorithms or hybrid approaches to optimize search times and resource use.

In summary, the proposed TAS framework represents an innovative stride in the domain of network pruning, offering a robust alternative to traditional methods. This approach not only optimizes the model's computing footprint but also sets a new standard for exploring architecture adaptiveness in the pursuit of high-performance, resource-efficient AI models.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Xuanyi Dong (28 papers)
Yi Yang (855 papers)

Citations (230)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - D-X-Y/AutoDL-Projects: Automated deep learning algorithms implemented in PyTorch. (1,557 stars)