Dynamic Slimmable Network (2103.13258v1)

Published 24 Mar 2021 in cs.CV

Abstract: Current dynamic networks and dynamic pruning methods have shown their promising capability in reducing theoretical computation complexity. However, dynamic sparse patterns on convolutional filters fail to achieve actual acceleration in real-world implementation, due to the extra burden of indexing, weight-copying, or zero-masking. Here, we explore a dynamic network slimming regime, named Dynamic Slimmable Network (DS-Net), which aims to achieve good hardware-efficiency via dynamically adjusting filter numbers of networks at test time with respect to different inputs, while keeping filters stored statically and contiguously in hardware to prevent the extra burden. Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate that comprises an attention head and a slimming head to predictively adjust network width with negligible extra computation cost. To ensure generality of each candidate architecture and the fairness of gate, we propose a disentangled two-stage training scheme inspired by one-shot NAS. In the first stage, a novel training technique for weight-sharing networks named In-place Ensemble Bootstrapping is proposed to improve the supernet training efficacy. In the second stage, Sandwich Gate Sparsification is proposed to assist the gate training by identifying easy and hard samples in an online way. Extensive experiments demonstrate our DS-Net consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods by a large margin (up to 5.9%). Typically, DS-Net achieves 2-4x computation reduction and 1.62x real-world acceleration over ResNet-50 and MobileNet with minimal accuracy drops on ImageNet. Code release: https://github.com/changlin31/DS-Net .

Authors (6)

Changlin Li (28 papers)
Guangrun Wang (43 papers)
Bing Wang (246 papers)
Xiaodan Liang (318 papers)
Zhihui Li (51 papers)
Xiaojun Chang (148 papers)

Citations (128)

View on Semantic Scholar

Summary

The paper introduces DS-Net, which adaptively adjusts network width using a dual-headed dynamic gate to enhance hardware performance.
It employs a two-stage training regime—leveraging one-shot NAS with In-place Ensemble Bootstrapping and Sandwich Gate Sparsification—to stabilize and optimize performance.
Experimental results on ImageNet show up to 5.9% accuracy gains and 2–4× computation reduction, emphasizing its potential for mobile and resource-constrained applications.

A Comprehensive Analysis of the Dynamic Slimmable Network

The paper introduces a novel approach termed the Dynamic Slimmable Network (DS-Net), which offers a solution to enhance hardware efficiency in neural networks by adaptively adjusting filter numbers based on input during inference. Distinct from previous methods of dynamic networks and dynamic pruning, the DS-Net aims to seamlessly integrate efficient computation without the typical burdens of indexing, weight-copying, or zero-masking, thereby achieving actual acceleration in real-world hardware implementations.

One of the paper's significant contributions is the introduction of a dynamic slicing technique combined with a double-headed dynamic gate. The network architecture maintains filters statically and contiguously in hardware, avoiding inefficiencies associated with dynamic sparse patterns. The gate system, composed of an attention head and a slimming head, dynamically adjusts the network width, incurring negligible computational cost. This architectural innovation is pivotal as it reconciles the theoretical promise of dynamic pruning with practical hardware acceleration.

The training regime for DS-Net is split into two distinct stages. The first stage is inspired by one-shot Neural Architecture Search (NAS), incorporating a novel training method called In-place Ensemble Bootstrapping (IEB). This method aims to stabilize training and enhance performance through ensemble techniques and exponential moving averages, addressing convergence challenges present in traditional in-place distillation approaches. The second stage introduces the Sandwich Gate Sparsification (SGS) approach, optimizing the dynamic gates to distinguish between easy and hard samples accurately, thereby enhancing the model’s efficiency and performance.

The experimental results presented in the paper underscore the effectiveness of the DS-Net. Extensive evaluations on ImageNet reveal its significant performance improvements over static and existing dynamic models, achieving up to 5.9% accuracy gains and yielding 2-4× computational reductions alongside notable real-world acceleration metrics. These results are particularly significant for applications in mobile and resource-constrained environments, where computation efficiency is paramount.

From a theoretical perspective, the DS-Net contributes to the ongoing discourse on the optimization of neural networks for efficiency without sacrificing performance. The disentangled training approach and the adept handling of the dynamic network's architecture routing present promising directions for future research in boosting the adaptability and efficiency of deep learning models.

Looking forward, the implications of this research extend to the development of more sophisticated dynamic inference techniques, potentially incorporating multi-dimensional adaptability and broader applications beyond standard image classification tasks. The promising results on object detection tasks using DS-Net highlight its potential for broader applicability and further optimization in diverse machine learning tasks.

In conclusion, DS-Net represents a significant advance in the paper of dynamic networks, offering both theoretical and practical benefits. Its strategic approach to balancing performance and efficiency can inspire further innovation in the design of flexible, adaptable neural networks suited for dynamic real-world environments. This work not only advances the state of the art in dynamic pruning but also sets a new benchmark for hardware-efficient deep learning models.

PDF Markdown

Related Papers

GitHub

GitHub - changlin31/DS-Net: (CVPR 2021, Oral) Dynamic Slimmable Network (227 stars)