OTOV2: Automatic, Generic, User-Friendly (2303.06862v2)
Abstract: The existing model compression methods via structured pruning typically require complicated multi-stage procedures. Each individual stage necessitates numerous engineering efforts and domain-knowledge from the end-users which prevent their wider applications onto broader scenarios. We propose the second generation of Only-Train-Once (OTOv2), which first automatically trains and compresses a general DNN only once from scratch to produce a more compact model with competitive performance without fine-tuning. OTOv2 is automatic and pluggable into various deep learning applications, and requires almost minimal engineering efforts from the users. Methodologically, OTOv2 proposes two major improvements: (i) Autonomy: automatically exploits the dependency of general DNNs, partitions the trainable variables into Zero-Invariant Groups (ZIGs), and constructs the compressed model; and (ii) Dual Half-Space Projected Gradient (DHSPG): a novel optimizer to more reliably solve structured-sparsity problems. Numerically, we demonstrate the generality and autonomy of OTOv2 on a variety of model architectures such as VGG, ResNet, CARN, ConvNeXt, DenseNet and StackedUnets, the majority of which cannot be handled by other methods without extensive handcrafting efforts. Together with benchmark datasets including CIFAR10/100, DIV2K, Fashion-MNIST, SVNH and ImageNet, its effectiveness is validated by performing competitively or even better than the state-of-the-arts. The source code is available at https://github.com/tianyic/only_train_once.
- Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017.
- Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European conference on computer vision (ECCV), pp. 252–268, 2018.
- Arseny. Onnx2torch: an onnx to pytorch converter. https://github.com/ENOT-AutoDL/onnx2torch, 2022.
- Onnx: Open neural network exchange. https://github.com/onnx/onnx, 2019.
- Storage efficient and dynamic flexible runtime channel pruning via deep reinforcement learning. 2019.
- A reduced-space algorithm for minimizing ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized convex functions. SIAM Journal on Optimization, 27(3):1583–1610, 2017.
- Farsa for ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized convex optimization: local convergence and numerical experience. Optimization Methods and Software, 2018.
- Neural network compression via sparse optimization. arXiv preprint arXiv:2011.04868, 2020a.
- A half-space stochastic projected gradient method for group-sparsity regularization. arXiv preprint arXiv:2009.12078, 2020b.
- Orthant based proximal stochastic gradient method for ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized optimization. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part III, pp. 57–73. Springer, 2021a.
- Only train once: A one-shot neural network training and pruning framework. In Advances in Neural Information Processing Systems, 2021b.
- Towards Automatic Neural Architecture Search within General Super-Networks. In arXiv preprint arXiv:2305.18030,2023.
- An Adaptive Half-Space Projection Method for Stochastic Optimization Problems with Group Sparse Regularization. In Transactions on Machine Learning Research, 2023.
- Structured sparsity inducing adaptive optimizers for deep learning. arXiv preprint arXiv:2102.03869, 2021.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- On the channel pruning using graph convolution network for convolutional neural network acceleration. 2022.
- Lossless cnn channel pruning via decoupling remembering and forgetting. Proceedings of the IEEE International Conference on Computer Vision, 2021.
- Efficient multi-objective neural architecture search via lamarckian evolution. arXiv preprint arXiv:1804.09081, 2018.
- The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2018.
- Stabilizing the lottery ticket hypothesis. arXiv preprint arXiv:1903.01611, 2019.
- The state of sparsity in deep neural networks. arXiv preprint arXiv:1902.09574, 2019.
- Highly efficient salient object detection with 100k parameters. In European Conference on Computer Vision, pp. 702–721. Springer, 2020.
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. In 2017 IEEE International Conference on Computer Vision (ICCV, 2017.
- Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
- Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866, 2018a.
- Channel pruning for accelerating very deep neural networks. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
- Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–800, 2018b.
- Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016.
- Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017.
- Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
- Data-driven sparse structure selection for deep neural networks. In Proceedings of the European conference on computer vision (ECCV), pp. 304–320, 2018.
- Operation-aware soft channel pruning using differentiable masks. In International Conference on Machine Learning, pp. 5122–5131. PMLR, 2020.
- Deep feature synthesis: Towards automating data science endeavors. In 2015 IEEE international conference on data science and advanced analytics (DSAA), pp. 1–10. IEEE, 2015.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Fast bayesian optimization of machine learning hyperparameters on large datasets. In Artificial intelligence and statistics, pp. 528–536. PMLR, 2017.
- A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, 2009.
- Eagleeye: Fast sub-net evaluation for efficient neural network pruning. In European Conference on Computer Vision, pp. 639–654. Springer, 2020a.
- Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.
- Group sparsity: The hinge between filter pruning and decomposition for network compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8018–8027, 2020b.
- Revisiting random channel pruning for neural network compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 191–201, 2022.
- Exploiting kernel sparsity and entropy for interpretable cnn compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2809, 2019.
- Toward compact convnets via structure-sparsity regularized filter pruning. IEEE transactions on neural networks and learning systems, 31(2):574–588, 2019.
- A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986, 2022.
- Bayesian compression for deep learning. In Advances in neural information processing systems, pp. 3288–3298, 2017.
- Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE international conference on computer vision, pp. 5058–5066, 2017.
- Prunetrain: fast neural network training by dynamic sparse model reconfiguration. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–13, 2019.
- A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, volume 2, pp. 416–423. IEEE, 2001.
- Pruning filter in filter. arXiv preprint arXiv:2009.14410, 2020.
- Learning pruning-friendly networks via frank-wolfe: One-shot, any-sparsity, and no retraining. In International Conference on Learning Representations, 2021.
- Structured bayesian pruning via log-normal multiplicative noise. In Advances in Neural Information Processing Systems, pp. 6775–6784, 2017.
- Reading digits in natural images with unsupervised feature learning. 2011.
- Attentive fine-grained structured sparsity for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17673–17682, 2022.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32. 2019.
- Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250, 2016.
- Comparing rewinding and fine-tuning in neural network pruning. arXiv preprint arXiv:2003.02389, 2020.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Springer, 2015.
- N2nskip: Learning highly sparse networks using neuron-to-neuron skip connections. In BMVC, 2020.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Bayesian bits: Unifying quantization and pruning. arXiv preprint arXiv:2005.07093, 2020.
- Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Learning structured sparsity in deep neural networks. arXiv preprint arXiv:1608.03665, 2016.
- Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.
- A proximal stochastic gradient method with progressive variance reduction. SIAM Journal on Optimization, 24(4):2057–2075, 2014.
- Automatic neural network compression by sparsity-quantization joint learning: A constrained optimization-based approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2178–2188, 2020.
- Deephoyer: Learning sparser neural network with differentiable scale-invariant sparsity measures. arXiv preprint arXiv:1908.09979, 2019.
- Gate decorator: Global filter pruning method for accelerating deep convolutional neural networks. arXiv preprint arXiv:1909.08174, 2019.
- On single image scale-up using sparse-representations. In International conference on curves and surfaces. Springer, 2010.
- A systematic dnn weight pruning framework using alternating direction method of multipliers. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 184–199, 2018.
- Accelerate cnn via recursive bayesian pruning. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3306–3315, 2019.
- Neuron-level structured pruning using polarization regularizer. Advances in Neural Information Processing Systems, 33, 2020.