Learning Intrinsic Sparse Structures within Long Short-Term Memory (1709.05027v7)

Published 15 Sep 2017 in cs.LG, cs.AI, cs.CL, and cs.NE

Abstract: Model compression is significant for the wide adoption of Recurrent Neural Networks (RNNs) in both user devices possessing limited resources and business clusters requiring quick responses to large-scale service requests. This work aims to learn structurally-sparse Long Short-Term Memory (LSTM) by reducing the sizes of basic structures within LSTM units, including input updates, gates, hidden states, cell states and outputs. Independently reducing the sizes of basic structures can result in inconsistent dimensions among them, and consequently, end up with invalid LSTM units. To overcome the problem, we propose Intrinsic Sparse Structures (ISS) in LSTMs. Removing a component of ISS will simultaneously decrease the sizes of all basic structures by one and thereby always maintain the dimension consistency. By learning ISS within LSTM units, the obtained LSTMs remain regular while having much smaller basic structures. Based on group Lasso regularization, our method achieves 10.59x speedup without losing any perplexity of a LLMing of Penn TreeBank dataset. It is also successfully evaluated through a compact model with only 2.69M weights for machine Question Answering of SQuAD dataset. Our approach is successfully extended to non- LSTM RNNs, like Recurrent Highway Networks (RHNs). Our source code is publicly available at https://github.com/wenwei202/iss-rnns

Citations (137)

View on Semantic Scholar

Summary

The paper introduces Intrinsic Sparse Structures (ISS) in LSTM to systematically reduce component sizes while preserving dimension consistency.
It uses Group Lasso regularization to achieve structured sparsity, enabling a 10.59x speedup in language modeling without losing accuracy.
Its approach generalizes to various RNN architectures, offering practical benefits for deploying efficient models in resource-constrained environments.

Learning Intrinsic Sparse Structures within Long Short-Term Memory

The paper, "Learning Intrinsic Sparse Structures within Long Short-Term Memory" by Wei Wen et al., explores the critical area of model compression in Recurrent Neural Networks (RNNs), with a specific focus on Long Short-Term Memory (LSTM) units. The importance of this work lies in its ability to enhance the computational efficiency of RNNs while maintaining or even improving their performance in terms of perplexity—a crucial metric in LLMing tasks.

The primary contribution of this research is the introduction of Intrinsic Sparse Structures (ISS) within LSTM units. ISS aims to strategically reduce the sizes of basic components of LSTM units—input updates, gates, hidden states, cell states, and outputs—in a manner that maintains dimension consistency. This consistency is crucial because inconsistently reduced dimensions would lead to invalid LSTM units.

A significant technical innovation in the paper is leveraging Group Lasso regularization. This technique enables the selective reduction of ISS components, resulting in structurally sparse LSTMs that retain their original network connectivity but with reduced dimensions. This approach contrasts traditional weight pruning methods, which often result in non-structured sparsity patterns that are inefficient for modern hardware.

The paper presents strong numerical results supporting the efficacy of the proposed method. Using the Penn TreeBank dataset for LLMing, the authors achieved a 10.59x speedup without a loss in perplexity, demonstrating the approach's viability for enhancing inference time without compromising accuracy. Additionally, the research extended ISS to Recurrent Highway Networks (RHNs) and evaluated it on the SQuAD dataset for machine Question Answering tasks, showcasing its adaptability and effectiveness across various RNN architectures.

Beyond immediate computational benefits, the implications of this research extend into the practical deployment of RNNs in resource-constrained environments, such as mobile devices or large-scale service clusters. The proposed ISS method allows for reduced model size, which directly translates to hardware and energy efficiency. This is increasingly important as AI models continue to integrate into everyday devices.

From a theoretical viewpoint, the paper contributes to the understanding of neural network structure optimization, proposing that careful structural sparing can replace traditional dense designs without sacrificing computational integrity or output quality.

Looking forward, the research opens up avenues for exploring similar structural sparsity techniques in other neural network architectures beyond RNNs, such as transformers, and for advancing the hardware-software co-design space to accommodate these efficient architectures. The method's efficacy in balancing sparsity and performance is a promising direction for future AI developments, particularly in enhancing the deployability of sophisticated AI models in real-world applications.

PDF Markdown

Related Papers

GitHub

GitHub - wenwei202/iss-rnns: Sparse Recurrent Neural Networks -- Pruning Connections and Hidden Sizes (TensorFlow) (74 stars)