Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures (1607.03250v1)

Published 12 Jul 2016 in cs.NE, cs.CV, and cs.LG

Abstract: State-of-the-art neural networks are getting deeper and wider. While their performance increases with the increasing number of layers and neurons, it is crucial to design an efficient deep architecture in order to reduce computational and memory costs. Designing an efficient neural network, however, is labor intensive requiring many experiments, and fine-tunings. In this paper, we introduce network trimming which iteratively optimizes the network by pruning unimportant neurons based on analysis of their outputs on a large dataset. Our algorithm is inspired by an observation that the outputs of a significant portion of neurons in a large network are mostly zero, regardless of what inputs the network received. These zero activation neurons are redundant, and can be removed without affecting the overall accuracy of the network. After pruning the zero activation neurons, we retrain the network using the weights before pruning as initialization. We alternate the pruning and retraining to further reduce zero activations in a network. Our experiments on the LeNet and VGG-16 show that we can achieve high compression ratio of parameters without losing or even achieving higher accuracy than the original network.

Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures

Overview

The paper "Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures" introduces a method for optimizing deep neural networks by pruning redundant neurons. The authors propose a technique that iteratively eliminates neurons with high zero activation rates, effectively reducing the number of parameters and computational load without deteriorating, and sometimes even improving, the overall network performance.

Methodology

The proposed approach is based on the observation that many neurons in large neural networks frequently have zero activations regardless of the input, indicating that these neurons contribute minimally to the output and can be pruned without negatively impacting the network’s accuracy. The method comprises a three-step process: identifying neurons with high zero activation percentages, pruning these neurons, and retraining the network to restore or even enhance its accuracy.

Definition and Calculation of Zero Activations

To quantify neuron utility, the paper introduces the Average Percentage of Zeros (APoZ) metric, defined as the percentage ratio of zero activations observed. Specifically, neurons exhibiting APoZ values greater than a determined threshold are considered for pruning. This metric was calculated using comprehensive datasets, such as the ImageNet validation set for VGG-16, ensuring robust and reliable assessment.

Trimming and Retraining

The algorithm follows an iterative pruning-retraining loop. After initial training, neurons with high APoZ are pruned, and the network is retrained using the pruned model's weights for initialization. This iterative process continues until an optimal balance between neuron count and network performance is achieved.

Experimental Validation

The efficacy of the network trimming approach was validated on two well-known networks: LeNet and VGG-16.

LeNet on MNIST

For the LeNet architecture, the trimming process was applied to the CONV2 and FC1 layers, achieving significant parameter reduction across multiple iterations without sacrificing accuracy. Notably, the network maintained or even improved its accuracy after each pruning iteration, demonstrating the robustness of the method. Specifically, by the fourth iteration, a compression rate of over 3.85× was achieved while retaining a high accuracy level of 99.26%.

VGG-16 on ImageNet

The VGG-16 architecture, characterized by its depth and parameter richness, underwent extensive neuron pruning in its latter convolutional and fully connected layers (CONV4, CONV5, FC6, FC7). The pruning process consistently reduced redundancy. Pruning iterations resulted in a compression rate of up to 2.59× while improving classification accuracy by up to 2-3% in both Top-1 and Top-5 accuracy metrics.

Comparison with Existing Methods

The authors compare their work with existing connection pruning methods, particularly highlighting distinctions from Han et al.'s weight pruning. One major advantage of network trimming is its natural fit for GPU implementations, as it eliminates entire neurons and their associated operations, rather than just individual connections. This leads to more substantial computational savings in practical deployment, especially for heavily parameterized layers.

Practical and Theoretical Implications

The implications of this research are significant in both practical and theoretical contexts. Practically, it presents a less labor-intensive method for optimizing neural networks, which could be critical for deploying deep learning models on resource-constrained devices. From a theoretical perspective, the findings indicate that commonly used neural network architectures contain substantial redundancy, opening avenues for research into more efficient network designs.

Future Developments

While the paper demonstrates the efficacy of neuron pruning on specific networks, future research could extend these techniques to a wider variety of architectures and tasks. Additionally, integrating neuron pruning with other model compression techniques, such as quantization and distillation, could further enhance overall efficiency.

Overall, Network Trimming presents a compelling and practical approach to deep network optimization, thus contributing valuable insights into the efficient deployment of neural networks in both research and application settings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hengyuan Hu (22 papers)
  2. Rui Peng (79 papers)
  3. Yu-Wing Tai (123 papers)
  4. Chi-Keung Tang (81 papers)
Citations (859)