Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Magnificent Minified Models (2306.10177v1)

Published 16 Jun 2023 in cs.LG

Abstract: This paper concerns itself with the task of taking a large trained neural network and 'compressing' it to be smaller by deleting parameters or entire neurons, with minimal decreases in the resulting model accuracy. We compare various methods of parameter and neuron selection: dropout-based neuron damage estimation, neuron merging, absolute-value based selection, random selection, OBD (Optimal Brain Damage). We also compare a variation on the classic OBD method that slightly outperformed all other parameter and neuron selection methods in our tests with substantial pruning, which we call OBD-SD. We compare these methods against quantization of parameters. We also compare these techniques (all applied to a trained neural network), with neural networks trained from scratch (random weight initialization) on various pruned architectures. Our results are only barely consistent with the Lottery Ticket Hypothesis, in that fine-tuning a parameter-pruned model does slightly better than retraining a similarly pruned model from scratch with randomly initialized weights. For neuron-level pruning, retraining from scratch did much better in our experiments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. Xin Dong, Shangyu Chen and Sinno Jialin Pan “Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon” In CoRR abs/1705.07565, 2017 arXiv: http://arxiv.org/abs/1705.07565
  2. “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks”, 2019 arXiv:1803.03635 [cs.LG]
  3. “Learning both Weights and Connections for Efficient Neural Networks” In CoRR abs/1506.02626, 2015 arXiv: http://arxiv.org/abs/1506.02626
  4. Babak Hassibi, David G. Stork and Gregory Wolff “Optimal Brain Surgeon: Extensions and performance comparisons” In Advances in Neural Information Processing Systems 6 Morgan-Kaufmann, 1994, pp. 263–270 URL: http://papers.nips.cc/paper/749-optimal-brain-surgeon-extensions-and-performance-comparisons.pdf
  5. Yann LeCun, John Denker and Sara Solla “Optimal Brain Damage” In Advances in Neural Information Processing Systems 2 Morgan-Kaufmann, 1990, pp. 598–605 URL: https://proceedings.neurips.cc/paper/1989/file/6c9882bbac1c7093bd25041881277658-Paper.pdf
  6. “Rethinking the Value of Network Pruning”, 2019 arXiv:1810.05270 [cs.LG]
  7. “ALOHA: Auxiliary Loss Optimization for Hypothesis Augmentation” In CoRR abs/1903.05700, 2019 arXiv: http://arxiv.org/abs/1903.05700
  8. “EDropout: Energy-Based Dropout and Pruning of Deep Neural Networks”, 2020 arXiv:2006.04270 [cs.LG]
  9. Volker Tresp, Ralph Neuneier and Hans-Georg Zimmermann “Early Brain Damage” In Advances in Neural Information Processing Systems 9 MIT Press, 1997, pp. 669–675 URL: https://proceedings.neurips.cc/paper/1996/file/2ac2406e835bd49c70469acae337d292-Paper.pdf
  10. “Merging Similar Neurons for Deep Networks Compression” In Cognitive Computation 12, 2020 DOI: 10.1007/s12559-019-09703-6

Summary

We haven't generated a summary for this paper yet.