Magnificent Minified Models (2306.10177v1)
Abstract: This paper concerns itself with the task of taking a large trained neural network and 'compressing' it to be smaller by deleting parameters or entire neurons, with minimal decreases in the resulting model accuracy. We compare various methods of parameter and neuron selection: dropout-based neuron damage estimation, neuron merging, absolute-value based selection, random selection, OBD (Optimal Brain Damage). We also compare a variation on the classic OBD method that slightly outperformed all other parameter and neuron selection methods in our tests with substantial pruning, which we call OBD-SD. We compare these methods against quantization of parameters. We also compare these techniques (all applied to a trained neural network), with neural networks trained from scratch (random weight initialization) on various pruned architectures. Our results are only barely consistent with the Lottery Ticket Hypothesis, in that fine-tuning a parameter-pruned model does slightly better than retraining a similarly pruned model from scratch with randomly initialized weights. For neuron-level pruning, retraining from scratch did much better in our experiments.
- Xin Dong, Shangyu Chen and Sinno Jialin Pan “Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon” In CoRR abs/1705.07565, 2017 arXiv: http://arxiv.org/abs/1705.07565
- “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks”, 2019 arXiv:1803.03635 [cs.LG]
- “Learning both Weights and Connections for Efficient Neural Networks” In CoRR abs/1506.02626, 2015 arXiv: http://arxiv.org/abs/1506.02626
- Babak Hassibi, David G. Stork and Gregory Wolff “Optimal Brain Surgeon: Extensions and performance comparisons” In Advances in Neural Information Processing Systems 6 Morgan-Kaufmann, 1994, pp. 263–270 URL: http://papers.nips.cc/paper/749-optimal-brain-surgeon-extensions-and-performance-comparisons.pdf
- Yann LeCun, John Denker and Sara Solla “Optimal Brain Damage” In Advances in Neural Information Processing Systems 2 Morgan-Kaufmann, 1990, pp. 598–605 URL: https://proceedings.neurips.cc/paper/1989/file/6c9882bbac1c7093bd25041881277658-Paper.pdf
- “Rethinking the Value of Network Pruning”, 2019 arXiv:1810.05270 [cs.LG]
- “ALOHA: Auxiliary Loss Optimization for Hypothesis Augmentation” In CoRR abs/1903.05700, 2019 arXiv: http://arxiv.org/abs/1903.05700
- “EDropout: Energy-Based Dropout and Pruning of Deep Neural Networks”, 2020 arXiv:2006.04270 [cs.LG]
- Volker Tresp, Ralph Neuneier and Hans-Georg Zimmermann “Early Brain Damage” In Advances in Neural Information Processing Systems 9 MIT Press, 1997, pp. 669–675 URL: https://proceedings.neurips.cc/paper/1996/file/2ac2406e835bd49c70469acae337d292-Paper.pdf
- “Merging Similar Neurons for Deep Networks Compression” In Cognitive Computation 12, 2020 DOI: 10.1007/s12559-019-09703-6