Universally Slimmable Networks and Improved Training Techniques (1903.05134v2)

Published 12 Mar 2019 in cs.CV and cs.AI

Abstract: Slimmable networks are a family of neural networks that can instantly adjust the runtime width. The width can be chosen from a predefined widths set to adaptively optimize accuracy-efficiency trade-offs at runtime. In this work, we propose a systematic approach to train universally slimmable networks (US-Nets), extending slimmable networks to execute at arbitrary width, and generalizing to networks both with and without batch normalization layers. We further propose two improved training techniques for US-Nets, named the sandwich rule and inplace distillation, to enhance training process and boost testing accuracy. We show improved performance of universally slimmable MobileNet v1 and MobileNet v2 on ImageNet classification task, compared with individually trained ones and 4-switch slimmable network baselines. We also evaluate the proposed US-Nets and improved training techniques on tasks of image super-resolution and deep reinforcement learning. Extensive ablation experiments on these representative tasks demonstrate the effectiveness of our proposed methods. Our discovery opens up the possibility to directly evaluate FLOPs-Accuracy spectrum of network architectures. Code and models are available at: https://github.com/JiahuiYu/slimmable_networks

Citations (366)

View on Semantic Scholar

Summary

The paper introduces universally slimmable networks that flexibly adjust model width to balance accuracy and efficiency.
The training method leverages the sandwich rule and inplace distillation to optimize performance across a continuum of network configurations.
Experimental results show enhanced accuracy on ImageNet and robust performance in image super-resolution and deep reinforcement learning tasks.

Overview of "Universally Slimmable Networks and Improved Training Techniques"

The paper "Universally Slimmable Networks and Improved Training Techniques" by Jiahui Yu and Thomas Huang introduces a new paradigm for neural network architectures that can operate efficiently at various computational demands, termed "universally slimmable networks" (US-Nets). This extension of slimmable networks presents a significant advancement by enabling neural networks to dynamically adjust to any desired width without the necessity of predefined configurations like batch normalization layers. The paper further introduces improved training methodologies, namely the sandwich rule and inplace distillation, to bolster the performance and accuracy of these universally adaptable networks.

Key Technical Innovations

Universally Slimmable Networks (US-Nets): Expanding on the concept of slimmable networks that support a fixed set of multiple predefined widths, US-Nets can adjust to any given width within a specified range. Essentially, this allows a single model to effectively manage the accuracy-efficiency trade-off dynamically based on operational constraints, such as latency and computational resource availability.
Training Enhancements:
- The Sandwich Rule: This training technique involves optimizing at the smallest and largest network widths during every training iteration, alongside additional random widths to improve convergence and overall model performance.
- Inplace Distillation: Inspired by knowledge distillation, this method transfers knowledge within the network from the full-width model to the sub-models during training, using the predictions of the full-width model as soft labels for the sub-models.
Batch Normalization Post-Statistics Adjustment: A major hindrance with slimmable networks was the inconsistency of batch normalization statistics across differing widths, addressed by recalibration of BN statistics post-training using only a subset of the training data.

Experimental Evaluation

US-Nets were experimentally validated on tasks across various domains, including ImageNet classification, image super-resolution, and deep reinforcement learning, showing comparable or superior performance to individually trained and traditional slimmable models. Detailed results highlight:

ImageNet Classification: Improvements in accuracy over baselines, specifically with MobileNet v1 and v2 architectures, demonstrated the effectiveness of US-Nets in maintaining high performance over a continuum of computational budgets.
Image Super-Resolution: US-Nets achieved similar or slightly lower PSNR values compared to individual networks, confirming their applicability beyond classification tasks.
Deep Reinforcement Learning: US-Nets used in training AI agents in Atari games exhibited superior mean episode rewards relative to non-slimmable networks, illustrating robustness across different learning paradigms.

Implications and Future Directions

The implications of universally slimmable architectures are profound, offering applications in scenarios where computational resources may fluctuate, such as mobile and embedded devices. The ability to evaluate networks across the FLOPs-Accuracy spectrum facilitates comparative analysis of architectures in an interpretive manner, paving the way for innovative network optimization techniques such as one-shot architecture search.

Looking ahead, theoretical analysis of deep networks vis-à-vis non-linear activations could further elucidate the bounded aggregation behavior intrinsic to slimmable networks. Additionally, the exploration of nonuniform slimmable networks could unlock newer architectures optimizing channel allocations across different layers—a potentially promising avenue for network slimming.

Overall, this work aligns as a significant contribution towards flexible and efficient neural network deployment, signaling potential evolutions in both the architecture design frameworks and methodological approaches within the field of neural network research.

PDF Markdown