Group Sparse Regularization for Deep Neural Networks (1607.00485v1)

Published 2 Jul 2016 in stat.ML and cs.LG

Abstract: In this paper, we consider the joint task of simultaneously optimizing (i) the weights of a deep neural network, (ii) the number of neurons for each hidden layer, and (iii) the subset of active input features (i.e., feature selection). While these problems are generally dealt with separately, we present a simple regularized formulation allowing to solve all three of them in parallel, using standard optimization routines. Specifically, we extend the group Lasso penalty (originated in the linear regression literature) in order to impose group-level sparsity on the network's connections, where each group is defined as the set of outgoing weights from a unit. Depending on the specific case, the weights can be related to an input variable, to a hidden neuron, or to a bias unit, thus performing simultaneously all the aforementioned tasks in order to obtain a compact network. We perform an extensive experimental evaluation, by comparing with classical weight decay and Lasso penalties. We show that a sparse version of the group Lasso penalty is able to achieve competitive performances, while at the same time resulting in extremely compact networks with a smaller number of input features. We evaluate both on a toy dataset for handwritten digit recognition, and on multiple realistic large-scale classification problems.

PDF Abstract

Group Sparse Regularization for Deep Neural Networks

The paper explores an advanced exploration of optimizing deep neural network architectures through the use of group sparse regularization techniques. Specifically, it extends the conventional Lasso penalties to incorporate group-level sparsity, aiming at jointly optimizing the weights, number of neurons per layer, and active input features within a deep learning framework. The authors introduce a novel regularization method by adapting the group Lasso penalty, traditionally used in linear regression, to enforce structured sparsity among neural connections.

Key Contributions

The paper's primary contribution lies in its unified approach to the simultaneous execution of network pruning, neuron optimization, and feature selection. By organizing weights into groups—encompassing input groups, hidden groups, and bias groups—this method facilitates a more efficient reduction of network complexity while maintaining competitive performance metrics. The approach promises significant reductions in computational and storage demands, making it particularly appealing for deployment in low-power and embedded applications.

Methodological Insights

The proposed group sparse regularization is articulated using the group Lasso penalty:

$R_{\ell_{2,1}}(#1{w}) = \sum_{#1{g} \in \mathcal{G}} \sqrt{|#1{g}|} \|#1{g}\|$

and its extension, the sparse group Lasso (SGL) penalty:

$R_{\text{SGL}}(#1{w}) = R_{\ell_{2,1}}(#1{w}) + R_{\ell_{1}}(#1{w})$

These formulations promote the zeroing out of entire groups of weights, allowing for holistic feature and neuron selection. Such an adaptive sparsity framework strategically enhances the architecture's efficiency by reducing unnecessary complexity without sacrificing accuracy.

Experimental Evaluation

The authors provide empirical validation using datasets including the MNIST, Sensorless Drive Diagnosis (SDD), and Forest Covertypes datasets. The sparse group Lasso approach attains comparable classification accuracies while achieving high levels of model compactness. Notable results include achieving approximately 96% sparsity in layer connections for the MNIST dataset, significantly minimizing network resource consumption.

Implications and Future Directions

This research demonstrates a significant step forward in neural network optimization concerning computational resources and interpretability of model design. It opens pathways for additional research into more complex neural architectures such as convolutional and recurrent neural networks. Future explorations might also investigate non-convex regularizers and more sophisticated optimization landscapes, providing further improvements in model efficiency.

Iterative solutions to non-convex reformulations or adoption in emerging neural architectures may yield even more compact and potent models. These developments could be game-changing for practical AI applications where computational resources are at a premium.

In summary, the paper provides a robust framework for group sparse regularization within neural networks, offering a pivotal tool for AI practitioners focused on optimizing model performance with limited resources. The introduction of group and sparse group Lasso penalties marks an informed stride towards more sustainable and efficient deep learning models.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Simone Scardapane (79 papers)
Danilo Comminiello (53 papers)
Amir Hussain (75 papers)
Aurelio Uncini (41 papers)

Citations (445)

View on Semantic Scholar

Group Sparse Regularization for Deep Neural Networks (1607.00485v1)