OTOV2: Automatic, Generic, User-Friendly (2303.06862v2)

Published 13 Mar 2023 in cs.CV and cs.AI

Abstract: The existing model compression methods via structured pruning typically require complicated multi-stage procedures. Each individual stage necessitates numerous engineering efforts and domain-knowledge from the end-users which prevent their wider applications onto broader scenarios. We propose the second generation of Only-Train-Once (OTOv2), which first automatically trains and compresses a general DNN only once from scratch to produce a more compact model with competitive performance without fine-tuning. OTOv2 is automatic and pluggable into various deep learning applications, and requires almost minimal engineering efforts from the users. Methodologically, OTOv2 proposes two major improvements: (i) Autonomy: automatically exploits the dependency of general DNNs, partitions the trainable variables into Zero-Invariant Groups (ZIGs), and constructs the compressed model; and (ii) Dual Half-Space Projected Gradient (DHSPG): a novel optimizer to more reliably solve structured-sparsity problems. Numerically, we demonstrate the generality and autonomy of OTOv2 on a variety of model architectures such as VGG, ResNet, CARN, ConvNeXt, DenseNet and StackedUnets, the majority of which cannot be handled by other methods without extensive handcrafting efforts. Together with benchmark datasets including CIFAR10/100, DIV2K, Fashion-MNIST, SVNH and ImageNet, its effectiveness is validated by performing competitively or even better than the state-of-the-arts. The source code is available at https://github.com/tianyic/only_train_once.

References (72)

Citations (26)

View on Semantic Scholar

Summary

The paper introduces an automatic model compression technique that partitions DNN variables into Zero-Invariant Groups to construct compact models without manual tuning.
The paper presents the Dual Half-Space Projected Gradient optimizer, which automatically adjusts regularization to efficiently achieve structured sparsity and faster convergence.
The paper validates OTOv2 on multiple architectures and benchmarks, demonstrating significant reductions in FLOPs and parameters while maintaining competitive accuracy.

An Expert Overview of "OTOv2: Automatic, Generic, User-Friendly"

The paper introduces OTOv2, the second iteration of the Only-Train-Once framework, designed to automatically train and compress deep neural networks (DNNs) efficiently. The framework aims to construct more compact models with high performance without requiring pre-training or fine-tuning, which are common in typical structured pruning methods. OTOv2 presents two significant innovations: automatic construction of compressed models and a novel optimization method tailored for structured sparsity.

Methodological Innovations

Automated Model Compression: OTOv2 automatically partitions trainable variables into Zero-Invariant Groups (ZIGs) that reflect dependencies between various components of a DNN. The ZIGs are crucial as their parameters can be zeroed without degrading the network's performance, thereby guiding the construction of a slimmer model without manual intervention. This autonomous approach significantly reduces the engineering burden on end-users, expanding the applicability of model compression to a wider array of users and scenarios.
Dual Half-Space Projected Gradient (DHSPG): The paper introduces DHSPG, a novel optimizer designed to tackle the challenges of structured sparsity more effectively. Unlike previous methods, DHSPG automatically adjusts regularization coefficients and organizes the search space, resulting in efficient sparsity exploration without excessive parameter tuning. The optimizer exploits two half-space projections for faster convergence and more reliable control over the desired sparsity level.

Numerical Results and Claims

The paper substantiates its claims through experiments across multiple architectures, such as VGG, ResNet, DenseNet, and modern architectures such as ConvNeXt and StackedUnets. Benchmark datasets including CIFAR10/100, DIV2K, Fashion-MNIST, SVNH, and ImageNet are employed to validate the efficacy of OTOv2. Results show OTOv2 consistently provides competitive or superior outcomes compared to existing state-of-the-art methods. Specifically, it achieves significant improvements in FLOPs and parameter reduction, bringing down computational costs while maintaining accuracy across various architectures and datasets.

Implications and Future Directions

The introduction of OTOv2 marks a significant step toward democratizing model compression. By reducing the dependency on user expertise and intricate engineering, it facilitates the deployment of high-performing, resource-efficient models, especially in constrained environments. The implications of this advancement are broad, impacting both practical application and theoretical research in deep learning model optimization.

Practically, OTOv2 aligns well with contemporary needs for deploying DNNs on limited-resource devices, making it particularly useful in mobile and edge computing scenarios. The framework theoretically underscores the viability of training frameworks that do not rely on iterative fine-tuning, challenging existing paradigms in model compression.

Looking forward, future research may focus on further enhancing the generality and applicability of such autonomous frameworks to capture even more diversified DNN architectures—potentially incorporating emerging architectures such as Transformers. Additionally, exploring hybrid methods that integrate ideas from OTOv2 with other paradigms like neural architecture search could yield new techniques for discovering highly efficient neural networks without manual input.

In summary, OTOv2 delivers a substantive leap in simplifying DNN compression through innovative automated processes and an improved optimization algorithm, ultimately pushing the boundaries of what is achievable in one-shot neural network training.

PDF Markdown

GitHub

GitHub - tianyic/only_train_once: OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM (292 stars)