Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization (1902.05967v3)

Published 15 Feb 2019 in cs.LG and stat.ML

Abstract: Modern deep neural networks are typically highly overparameterized. Pruning techniques are able to remove a significant fraction of network parameters with little loss in accuracy. Recently, techniques based on dynamic reallocation of non-zero parameters have emerged, allowing direct training of sparse networks without having to pre-train a large dense model. Here we present a novel dynamic sparse reparameterization method that addresses the limitations of previous techniques such as high computational cost and the need for manual configuration of the number of free parameters allocated to each layer. We evaluate the performance of dynamic reallocation methods in training deep convolutional networks and show that our method outperforms previous static and dynamic reparameterization methods, yielding the best accuracy for a fixed parameter budget, on par with accuracies obtained by iteratively pruning a pre-trained dense model. We further investigated the mechanisms underlying the superior generalization performance of the resultant sparse networks. We found that neither the structure, nor the initialization of the non-zero parameters were sufficient to explain the superior performance. Rather, effective learning crucially depended on the continuous exploration of the sparse network structure space during training. Our work suggests that exploring structural degrees of freedom during training is more effective than adding extra parameters to the network.

PDF Abstract

Review of Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization

The paper "Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization" introduces a novel approach to training deep neural networks with a fixed parameter budget, which dynamically reallocates parameters across the network during the training process. This technique presents a significant advancement from traditional sparse reparameterization methods by effectively optimizing both the network structure and the parameter values, achieving performance on par with models that are iteratively pruned post-training.

Core Contributions

The authors propose a dynamic sparse reparameterization method, addressing key limitations inherent in previous techniques, such as high computational cost and requirements for manual configuration of sparsity for each layer. The central idea behind their method is the dynamic allocation and reallocation of non-zero parameters, enabling the training of sparse networks without having to start with a dense, overparameterized model.

Key Features:

Dynamic Reallocation: Parameters are reallocated based on training needs assessed periodically, facilitating effective exploration of the sparse network structure.
Adaptive Thresholding: Uses a global threshold for pruning weights, simplifying the computational process.
Weight Redistribution: Free parameters are allocated to layers based on heuristic-driven metrics, optimizing their utility.

Empirical Evaluation

The authors undertake comprehensive experiments on different models and data sets, including WRN-28-2 on CIFAR10 and ResNet-50 on ImageNet, evaluating their method against a suite of baselines including both static and dynamic reparameterization techniques. The proposed method consistently yields superior results, meeting or exceeding the performance achieved through iterative pruning of dense networks—a benchmark for sparse networks.

Results Overview:

Performance Comparison: Outperforms static sparse and thin dense baselines under similar parameter constraints.
Computational Efficiency: Maintains minimal computational overhead compared to other dynamic methods like Deep Rewiring (DeepR).
Sparsity Patterns: Automatic discovery of effective per-layer sparsity patterns, optimizing the allocation of parameters.

Theoretical and Practical Implications

The findings in this paper suggest that dynamic exploration of structural degrees of freedom presents a viable and often superior alternative to mere parameter expansion via overparameterization for achieving effective generalization. This sheds light on broader implications for both theoretical understandings of training dynamics in deep networks and practical considerations for the deployment of neural architectures on resource-constrained devices.

Future Prospects

This research opens potential pathways for further refinement in both methodology and application:

Hardware Acceleration: Insights from this method encourage the design of hardware specifically optimized for sparse network operations.
Structured Sparsity: Exploring structured variations of this dynamic sparse reparameterization could yield further improvements in computational efficiency.
Broader Applications: Extending this approach to other neural architectures or domains where parameter efficiency is critical, such as mobile and embedded AI systems.

Overall, this dynamic sparse reparameterization technique presents a poignant alternative to traditional training methodologies, indicating that exploring structural networks' potentialities during training aligns with achieving remarkably efficient and performant models.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Hesham Mostafa (26 papers)
Xin Wang (1306 papers)

Citations (294)

View on Semantic Scholar