Compression-aware Training of Deep Networks (1711.02638v2)

Published 7 Nov 2017 in cs.CV

Abstract: In recent years, great progress has been made in a variety of application domains thanks to the development of increasingly deeper neural networks. Unfortunately, the huge number of units of these networks makes them expensive both computationally and memory-wise. To overcome this, exploiting the fact that deep networks are over-parametrized, several compression strategies have been proposed. These methods, however, typically start from a network that has been trained in a standard manner, without considering such a future compression. In this paper, we propose to explicitly account for compression in the training process. To this end, we introduce a regularizer that encourages the parameter matrix of each layer to have low rank during training. We show that accounting for compression during training allows us to learn much more compact, yet at least as effective, models than state-of-the-art compression techniques.

Authors (2)

Jose M. Alvarez (90 papers)
Mathieu Salzmann (185 papers)

Citations (169)

View on Semantic Scholar

Summary

Compression-aware Training of Deep Networks

The paper "Compression-aware Training of Deep Networks," authored by Jose M. Alvarez and Mathieu Salzmann, presents a novel methodology for training deep neural networks with a focus on compression. The main objective is to mitigate the computational and memory costs associated with deep networks by explicitly considering compression during the training phase rather than as a post-hoc process.

Overview

Deep neural networks have shown substantial success across various domains by utilizing increasingly complex architectures. Despite their prowess, these models are typically over-parametrized, leading to inefficiencies during deployment, especially in environments with constrained hardware resources. Traditional compression techniques aim to reduce the size of pre-trained networks, often resulting in diminished prediction accuracy due to arbitrary truncation of parameters or units.

The paper introduces a compression-aware training strategy that leverages a regularizer to enforce low rank in the parameter matrices during the training of neural networks. This regularization encourages the correlation of units within each layer, facilitating easier pruning in subsequent stages. The approach employs a proximal stochastic gradient descent technique to optimize network parameters, ensuring that the eventual compressed model maintains high effectiveness and efficiency.

Key Contributions

Low-rank Regularization: The authors present a novel regularizer targeting the nuclear norm to promote low-rank representation within network parameter matrices. This inherently strives to reduce model complexity by aligning training objectives with post-processing compression strategies.
Combination with Group Sparsity: Extending beyond low rank, the paper explores the integration of group sparsity regularization, specifically sparse group Lasso, to prune entire units from the network. This dual regularization allows for significant reductions in the model’s memory footprint without substantial loss of predictive power.
Increased Compression Rates: The compression-aware approach yields models with higher compression rates compared to traditional methods. The experimental results showcase over 90% compression rates with negligible losses in accuracy for DecomposeMe and ResNet architectures on datasets such as ImageNet and ICDAR.

Implications

By systematically incorporating compression into the training paradigm, this research addresses the dual challenges of computational overhead and memory inefficiency in deep learning. The proposed methodology promises significant benefits for deploying deep networks on platforms with limited computational resources, which is increasingly pertinent given the growth of mobile and edge AI applications.

Future Directions

There are several avenues for future research based on the findings of this paper. Exploring regularizers corresponding to a broader array of compression mechanisms could diversify the applicability of compression-aware training strategies. Additionally, leveraging hardware-specific optimizations for such decomposed networks might enhance inference speed further, paving the way for real-time applications on resource-constrained devices.

The paper marks a crucial step towards more intelligent model design that inherently acknowledges the constraints imposed by deployment environments, offering a pragmatic solution to the AI community's ongoing struggle with scalability and efficiency.

PDF Markdown