Progressive Skeletonization: Trimming more fat from a network at initialization (2006.09081v5)

Published 16 Jun 2020 in cs.CV and cs.LG

Abstract: Recent studies have shown that skeletonization (pruning parameters) of networks \textit{at initialization} provides all the practical benefits of sparsity both at inference and training time, while only marginally degrading their performance. However, we observe that beyond a certain level of sparsity (approx $95\%$), these approaches fail to preserve the network performance, and to our surprise, in many cases perform even worse than trivial random pruning. To this end, we propose an objective to find a skeletonized network with maximum {\em foresight connection sensitivity} (FORCE) whereby the trainability, in terms of connection sensitivity, of a pruned network is taken into consideration. We then propose two approximate procedures to maximize our objective (1) Iterative SNIP: allows parameters that were unimportant at earlier stages of skeletonization to become important at later stages; and (2) FORCE: iterative process that allows exploration by allowing already pruned parameters to resurrect at later stages of skeletonization. Empirical analyses on a large suite of experiments show that our approach, while providing at least as good a performance as other recent approaches on moderate pruning levels, provides remarkably improved performance on higher pruning levels (could remove up to $99.5\%$ parameters while keeping the networks trainable). Code can be found in https://github.com/naver/force.

PDF Abstract

Review of Progressive Skeletonization: Trimming More Fat from a Network at Initialization

The paper "Progressive Skeletonization: Trimming More Fat from a Network at Initialization" presents novel methodologies to enhance the pruning process of neural networks by focusing on initialization, specifically targeting high sparsity levels beyond 95%. The primary objective is to identify trainable sub-networks at the time of initialization, thus maintaining efficient training dynamics without the necessity of a fully dense model.

The paper situates itself in the context of neural network pruning, a well-explored area aiming at reducing the computational load of neural networks by pruning unimportant parameters. Despite the progress, conventional methods predominantly rely on expensive train-prune cycles. More recent approaches propose pruning at initialization to overcome these limitations, with the seminal SNIP (Single-shot Network Pruning) algorithm leading the way by estimating connection sensitivity to identify crucial weights.

However, the authors of this paper observe that methods like SNIP and GRASP, although efficient at moderate pruning levels, underperform at higher sparsity levels, often doing worse than random pruning when sparsity exceeds 95%. To remedy these issues, the paper introduces the concept of Foresight Connection Sensitivity (FORCE). This metric evaluates connection sensitivity after pruning, and it is hypothesized that it is a better predictor of a network's trainability post-pruning.

The paper develops two iterative methodologies, Iterative SNIP and FORCE, to optimize FORCE progressively. Iterative SNIP allows for dynamic reassessment of weight importance, enabling previously unimportant parameters to gain importance over iterations. In contrast, FORCE introduces an adaptive mechanism for weights recovery, thereby promoting exploration of model configurations which may circumvent the sharp performance degradation seen in single-pass methods.

Empirical results from extensive experiments on standard datasets (CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet) using architectures such as ResNet50 and VGG19 reveal significant improvements in accuracy at extreme sparsity levels using Iterative SNIP and FORCE. For instance, they demonstrate effectiveness in maintaining accuracy at up to 99.5% pruning, a range in which SNIP and GRASP exhibit stark drops in performance.

The findings have important theoretical and practical implications. Theoretically, this work contributes to understanding the dynamics of pruning at initialization, enforcing the idea that the structure of sub-networks is more critical than the weights at initialization. Practically, by maintaining sparse models from initialization, the approaches allow for resource-efficient training, addressing computational sustainability in machine learning, an area of growing importance given the environmental impact of extensive model training cycles.

Looking forward, the paper opens exciting avenues for research—particularly in refining saliency criteria, exploring more efficient progressive approaches, and conducting a more profound exploration of network topology versus weight distribution dynamics. The interactions between different levels of sparsity, network architecture, and initialization merit further investigation, which could drive advancements in neural network compression and potentially spur reinvention of neural network architecture design with sparsity considerations built inherently.

In conclusion, this paper enhances the pruning landscape by devising algorithms that adaptively and progressively prune at initialization, presenting viable solutions to the challenges encountered at extreme sparsity, and overcoming limitations of seminal prior work in the domain.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Pau de Jorge (8 papers)
Amartya Sanyal (35 papers)
Harkirat S. Behl (1 paper)
Philip H. S. Torr (219 papers)
Gregory Rogez (36 papers)
Puneet K. Dokania (44 papers)

Citations (89)

View on Semantic Scholar

Progressive Skeletonization: Trimming more fat from a network at initialization (2006.09081v5)

Review of Progressive Skeletonization: Trimming More Fat from a Network at Initialization

Related Papers

GitHub

YouTube