Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Channel-wise pruning of neural networks with tapering resource constraint (1812.07060v1)

Published 4 Dec 2018 in cs.CV, cs.LG, and stat.ML

Abstract: Neural network pruning is an important step in design process of efficient neural networks for edge devices with limited computational power. Pruning is a form of knowledge transfer from the weights of the original network to a smaller target subnetwork. We propose a new method for compute-constrained structured channel-wise pruning of convolutional neural networks. The method iteratively fine-tunes the network, while gradually tapering the computation resources available to the pruned network via a holonomic constraint in the method of Lagrangian multipliers framework. An explicit and adaptive automatic control over the rate of tapering is provided. The trainable parameters of our pruning method are separate from the weights of the neural network, which allows us to avoid the interference with the neural network solver (e.g. avoid the direct dependence of pruning speed on neural network learning rates). Our method combines the `rigoristic' approach by the direct application of constrained optimization, avoiding the pitfalls of ADMM-based methods, like their need to define the target amount of resources for each pruning run, and direct dependence of pruning speed and priority of pruning on the relative scale of weights between layers. For VGG-16 @ ILSVRC-2012, we achieve reduction of 15.47 -> 3.87 GMAC with only 1% top-1 accuracy reduction (68.4% -> 67.4%). For AlexNet @ ILSVRC-2012, we achieve 0.724 -> 0.411 GMAC with 1% top-1 accuracy reduction (56.8% -> 55.8%).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Alexey Kruglov (1 paper)
Citations (1)