Pruning neural networks without any data by iteratively conserving synaptic flow (2006.05467v3)

Published 9 Jun 2020 in cs.LG, cond-mat.dis-nn, cs.CV, q-bio.NC, and stat.ML

Abstract: Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking at the data? We provide an affirmative answer to this question through theory driven algorithm design. We first mathematically formulate and experimentally verify a conservation law that explains why existing gradient-based pruning algorithms at initialization suffer from layer-collapse, the premature pruning of an entire layer rendering a network untrainable. This theory also elucidates how layer-collapse can be entirely avoided, motivating a novel pruning algorithm Iterative Synaptic Flow Pruning (SynFlow). This algorithm can be interpreted as preserving the total flow of synaptic strengths through the network at initialization subject to a sparsity constraint. Notably, this algorithm makes no reference to the training data and consistently competes with or outperforms existing state-of-the-art pruning algorithms at initialization over a range of models (VGG and ResNet), datasets (CIFAR-10/100 and Tiny ImageNet), and sparsity constraints (up to 99.99 percent). Thus our data-agnostic pruning algorithm challenges the existing paradigm that, at initialization, data must be used to quantify which synapses are important.

PDF Abstract

Pruning Neural Networks Without Data by Iteratively Conserving Synaptic Flow

The paper "Pruning Neural Networks Without Any Data by Iteratively Conserving Synaptic Flow" addresses the significant challenge of neural network pruning aimed at reducing computational resources. Traditional pruning methods often depend on post-training data and computationally intensive iterations. This paper posits an alternative: pruning without data by employing a theory-driven design.

Key Contributions

Layer-Collapse Issue: The authors identify "layer-collapse," a condition where pruning algorithms inadvertently prune entire layers, leaving the network untrainable. They emphasize the importance of avoiding this phenomenon to maintain network viability.
Conservation Laws: The paper identifies conservation laws regarding synaptic saliency, a gradient-based score metric. These laws reveal that scores across layers, while maintaining conservation, tend to favor parameters of smaller layers, leading to layer-collapse in global masking.
Iterative Synaptic Flow Pruning (SynFlow): This innovative algorithm iteratively prunes networks by maintaining maximal critical compression without data. SynFlow circumvents layer-collapse by conserving synaptic flow, ensuring all layer parts contribute to the network's trainability.
Empirical Validation: Across various models (VGG, ResNet) and datasets (CIFAR-10/100, Tiny ImageNet), SynFlow demonstrates performance on par or superior to state-of-the-art methods without using any training data, especially in high compression regimes.

Methodological Insights

The SynFlow algorithm is pivotal for its data-agnostic approach, focusing on synaptic strengths' flow rather than isolated magnitudes. This method retains layer-wide balance, avoiding catastrophic pruning patterns that typically come from mismanaged layer relationships.

Pruning Strategy: SynFlow relies on iterative scoring, maintaining a positive and layer-wise conserved score, adhering to maximal critical compression. This guarantees no layer is fully pruned unless all can be equivalently reduced, preventing layer-collapse.
Algorithm Efficiency: The pruning process involves multiple forward and backward passes, yet it remains computationally favorable without requiring training dataset iterations.

Theoretical Contributions

Conservation Laws: By establishing neuron-wise and network-wise conservation laws of synaptic saliency, the paper lays a theoretical foundation for understanding pruning dynamics.
Layer-Size and Score Relationship: The work reveals an inverse relationship between layer size and mean score under the constraints of synaptic conservation—highlighting a non-obvious balancing mechanism across a network architecture.

Implications and Future Directions

Practical Deployment: SynFlow’s efficiency in pruning independent of data paves the way for faster deployment in edge computing devices and resource-constrained environments.
Algorithm Enhancement: The exploration of conservation and iteration suggests potential refinements in other heuristic-based algorithms. These could improve robustness against layer-collapse when adopting global pruning strategies.
Broader Applications: While this research focuses on vision models, the principles of SynFlow could extend to broader AI domains, particularly those requiring adaptive architectures.
Theoretical Exploration: Further investigation could explore how conservation principles interact with alternative activation functions or network settings, potentially refining neural network initialization or architecture design strategies.

In conclusion, the paper provides substantial contributions to understanding and implementing effective, data-agnostic neural network pruning. The SynFlow algorithm presents a compelling tool, aligning theoretical insights with practical efficiency, setting a new standard for neural network compression.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Hidenori Tanaka (36 papers)
Daniel Kunin (12 papers)
Daniel L. K. Yamins (26 papers)
Surya Ganguli (73 papers)

Citations (580)

View on Semantic Scholar