Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OTOV2: Automatic, Generic, User-Friendly (2303.06862v2)

Published 13 Mar 2023 in cs.CV and cs.AI

Abstract: The existing model compression methods via structured pruning typically require complicated multi-stage procedures. Each individual stage necessitates numerous engineering efforts and domain-knowledge from the end-users which prevent their wider applications onto broader scenarios. We propose the second generation of Only-Train-Once (OTOv2), which first automatically trains and compresses a general DNN only once from scratch to produce a more compact model with competitive performance without fine-tuning. OTOv2 is automatic and pluggable into various deep learning applications, and requires almost minimal engineering efforts from the users. Methodologically, OTOv2 proposes two major improvements: (i) Autonomy: automatically exploits the dependency of general DNNs, partitions the trainable variables into Zero-Invariant Groups (ZIGs), and constructs the compressed model; and (ii) Dual Half-Space Projected Gradient (DHSPG): a novel optimizer to more reliably solve structured-sparsity problems. Numerically, we demonstrate the generality and autonomy of OTOv2 on a variety of model architectures such as VGG, ResNet, CARN, ConvNeXt, DenseNet and StackedUnets, the majority of which cannot be handled by other methods without extensive handcrafting efforts. Together with benchmark datasets including CIFAR10/100, DIV2K, Fashion-MNIST, SVNH and ImageNet, its effectiveness is validated by performing competitively or even better than the state-of-the-arts. The source code is available at https://github.com/tianyic/only_train_once.

An Expert Overview of "OTOv2: Automatic, Generic, User-Friendly"

The paper introduces OTOv2, the second iteration of the Only-Train-Once framework, designed to automatically train and compress deep neural networks (DNNs) efficiently. The framework aims to construct more compact models with high performance without requiring pre-training or fine-tuning, which are common in typical structured pruning methods. OTOv2 presents two significant innovations: automatic construction of compressed models and a novel optimization method tailored for structured sparsity.

Methodological Innovations

  1. Automated Model Compression: OTOv2 automatically partitions trainable variables into Zero-Invariant Groups (ZIGs) that reflect dependencies between various components of a DNN. The ZIGs are crucial as their parameters can be zeroed without degrading the network's performance, thereby guiding the construction of a slimmer model without manual intervention. This autonomous approach significantly reduces the engineering burden on end-users, expanding the applicability of model compression to a wider array of users and scenarios.
  2. Dual Half-Space Projected Gradient (DHSPG): The paper introduces DHSPG, a novel optimizer designed to tackle the challenges of structured sparsity more effectively. Unlike previous methods, DHSPG automatically adjusts regularization coefficients and organizes the search space, resulting in efficient sparsity exploration without excessive parameter tuning. The optimizer exploits two half-space projections for faster convergence and more reliable control over the desired sparsity level.

Numerical Results and Claims

The paper substantiates its claims through experiments across multiple architectures, such as VGG, ResNet, DenseNet, and modern architectures such as ConvNeXt and StackedUnets. Benchmark datasets including CIFAR10/100, DIV2K, Fashion-MNIST, SVNH, and ImageNet are employed to validate the efficacy of OTOv2. Results show OTOv2 consistently provides competitive or superior outcomes compared to existing state-of-the-art methods. Specifically, it achieves significant improvements in FLOPs and parameter reduction, bringing down computational costs while maintaining accuracy across various architectures and datasets.

Implications and Future Directions

The introduction of OTOv2 marks a significant step toward democratizing model compression. By reducing the dependency on user expertise and intricate engineering, it facilitates the deployment of high-performing, resource-efficient models, especially in constrained environments. The implications of this advancement are broad, impacting both practical application and theoretical research in deep learning model optimization.

Practically, OTOv2 aligns well with contemporary needs for deploying DNNs on limited-resource devices, making it particularly useful in mobile and edge computing scenarios. The framework theoretically underscores the viability of training frameworks that do not rely on iterative fine-tuning, challenging existing paradigms in model compression.

Looking forward, future research may focus on further enhancing the generality and applicability of such autonomous frameworks to capture even more diversified DNN architectures—potentially incorporating emerging architectures such as Transformers. Additionally, exploring hybrid methods that integrate ideas from OTOv2 with other paradigms like neural architecture search could yield new techniques for discovering highly efficient neural networks without manual input.

In summary, OTOv2 delivers a substantive leap in simplifying DNN compression through innovative automated processes and an improved optimization algorithm, ultimately pushing the boundaries of what is achievable in one-shot neural network training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017.
  2. Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European conference on computer vision (ECCV), pp.  252–268, 2018.
  3. Arseny. Onnx2torch: an onnx to pytorch converter. https://github.com/ENOT-AutoDL/onnx2torch, 2022.
  4. Onnx: Open neural network exchange. https://github.com/onnx/onnx, 2019.
  5. Storage efficient and dynamic flexible runtime channel pruning via deep reinforcement learning. 2019.
  6. A reduced-space algorithm for minimizing ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized convex functions. SIAM Journal on Optimization, 27(3):1583–1610, 2017.
  7. Farsa for ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized convex optimization: local convergence and numerical experience. Optimization Methods and Software, 2018.
  8. Neural network compression via sparse optimization. arXiv preprint arXiv:2011.04868, 2020a.
  9. A half-space stochastic projected gradient method for group-sparsity regularization. arXiv preprint arXiv:2009.12078, 2020b.
  10. Orthant based proximal stochastic gradient method for ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized optimization. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part III, pp.  57–73. Springer, 2021a.
  11. Only train once: A one-shot neural network training and pruning framework. In Advances in Neural Information Processing Systems, 2021b.
  12. Towards Automatic Neural Architecture Search within General Super-Networks. In arXiv preprint arXiv:2305.18030,2023.
  13. An Adaptive Half-Space Projection Method for Stochastic Optimization Problems with Group Sparse Regularization. In Transactions on Machine Learning Research, 2023.
  14. Structured sparsity inducing adaptive optimizers for deep learning. arXiv preprint arXiv:2102.03869, 2021.
  15. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  16. On the channel pruning using graph convolution network for convolutional neural network acceleration. 2022.
  17. Lossless cnn channel pruning via decoupling remembering and forgetting. Proceedings of the IEEE International Conference on Computer Vision, 2021.
  18. Efficient multi-objective neural architecture search via lamarckian evolution. arXiv preprint arXiv:1804.09081, 2018.
  19. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2018.
  20. Stabilizing the lottery ticket hypothesis. arXiv preprint arXiv:1903.01611, 2019.
  21. The state of sparsity in deep neural networks. arXiv preprint arXiv:1902.09574, 2019.
  22. Highly efficient salient object detection with 100k parameters. In European Conference on Computer Vision, pp.  702–721. Springer, 2020.
  23. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. In 2017 IEEE International Conference on Computer Vision (ICCV, 2017.
  24. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
  25. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
  26. Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866, 2018a.
  27. Channel pruning for accelerating very deep neural networks. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  28. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV), pp.  784–800, 2018b.
  29. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016.
  30. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  4700–4708, 2017.
  31. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
  32. Data-driven sparse structure selection for deep neural networks. In Proceedings of the European conference on computer vision (ECCV), pp.  304–320, 2018.
  33. Operation-aware soft channel pruning using differentiable masks. In International Conference on Machine Learning, pp. 5122–5131. PMLR, 2020.
  34. Deep feature synthesis: Towards automating data science endeavors. In 2015 IEEE international conference on data science and advanced analytics (DSAA), pp.  1–10. IEEE, 2015.
  35. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  36. Fast bayesian optimization of machine learning hyperparameters on large datasets. In Artificial intelligence and statistics, pp.  528–536. PMLR, 2017.
  37. A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, 2009.
  38. Eagleeye: Fast sub-net evaluation for efficient neural network pruning. In European Conference on Computer Vision, pp.  639–654. Springer, 2020a.
  39. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.
  40. Group sparsity: The hinge between filter pruning and decomposition for network compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  8018–8027, 2020b.
  41. Revisiting random channel pruning for neural network compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  191–201, 2022.
  42. Exploiting kernel sparsity and entropy for interpretable cnn compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  2800–2809, 2019.
  43. Toward compact convnets via structure-sparsity regularized filter pruning. IEEE transactions on neural networks and learning systems, 31(2):574–588, 2019.
  44. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  11976–11986, 2022.
  45. Bayesian compression for deep learning. In Advances in neural information processing systems, pp. 3288–3298, 2017.
  46. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE international conference on computer vision, pp.  5058–5066, 2017.
  47. Prunetrain: fast neural network training by dynamic sparse model reconfiguration. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp.  1–13, 2019.
  48. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, volume 2, pp.  416–423. IEEE, 2001.
  49. Pruning filter in filter. arXiv preprint arXiv:2009.14410, 2020.
  50. Learning pruning-friendly networks via frank-wolfe: One-shot, any-sparsity, and no retraining. In International Conference on Learning Representations, 2021.
  51. Structured bayesian pruning via log-normal multiplicative noise. In Advances in Neural Information Processing Systems, pp. 6775–6784, 2017.
  52. Reading digits in natural images with unsupervised feature learning. 2011.
  53. Attentive fine-grained structured sparsity for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  17673–17682, 2022.
  54. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32. 2019.
  55. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250, 2016.
  56. Comparing rewinding and fine-tuning in neural network pruning. arXiv preprint arXiv:2003.02389, 2020.
  57. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp.  234–241. Springer, 2015.
  58. N2nskip: Learning highly sparse networks using neuron-to-neuron skip connections. In BMVC, 2020.
  59. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  60. Bayesian bits: Unifying quantization and pruning. arXiv preprint arXiv:2005.07093, 2020.
  61. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  62. Learning structured sparsity in deep neural networks. arXiv preprint arXiv:1608.03665, 2016.
  63. Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
  64. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.
  65. A proximal stochastic gradient method with progressive variance reduction. SIAM Journal on Optimization, 24(4):2057–2075, 2014.
  66. Automatic neural network compression by sparsity-quantization joint learning: A constrained optimization-based approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2178–2188, 2020.
  67. Deephoyer: Learning sparser neural network with differentiable scale-invariant sparsity measures. arXiv preprint arXiv:1908.09979, 2019.
  68. Gate decorator: Global filter pruning method for accelerating deep convolutional neural networks. arXiv preprint arXiv:1909.08174, 2019.
  69. On single image scale-up using sparse-representations. In International conference on curves and surfaces. Springer, 2010.
  70. A systematic dnn weight pruning framework using alternating direction method of multipliers. In Proceedings of the European Conference on Computer Vision (ECCV), pp.  184–199, 2018.
  71. Accelerate cnn via recursive bayesian pruning. In Proceedings of the IEEE International Conference on Computer Vision, pp.  3306–3315, 2019.
  72. Neuron-level structured pruning using polarization regularizer. Advances in Neural Information Processing Systems, 33, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tianyi Chen (139 papers)
  2. Luming Liang (27 papers)
  3. Tianyu Ding (36 papers)
  4. Zhihui Zhu (79 papers)
  5. Ilya Zharkov (25 papers)
Citations (26)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets