How Can We Be So Dense? The Benefits of Using Highly Sparse Representations (1903.11257v2)

Published 27 Mar 2019 in cs.LG and stat.ML

Abstract: Most artificial networks today rely on dense representations, whereas biological networks rely on sparse representations. In this paper we show how sparse representations can be more robust to noise and interference, as long as the underlying dimensionality is sufficiently high. A key intuition that we develop is that the ratio of the operable volume around a sparse vector divided by the volume of the representational space decreases exponentially with dimensionality. We then analyze computationally efficient sparse networks containing both sparse weights and activations. Simulations on MNIST and the Google Speech Command Dataset show that such networks demonstrate significantly improved robustness and stability compared to dense networks, while maintaining competitive accuracy. We discuss the potential benefits of sparsity on accuracy, noise robustness, hyperparameter tuning, learning speed, computational efficiency, and power requirements.

Citations (93)

View on Semantic Scholar

Summary

The paper demonstrates that highly sparse representations provide inherent robustness to noise and interference in neural networks due to their high-dimensional sparse nature.
Authors empirically validated sparse networks with sparse weights and activations on datasets like MNIST, showing competitive accuracy and superior robustness under noisy conditions.
Sparse representations offer potential for substantial computational efficiency and reduced power usage, suggesting a need for hardware support and future research directions.

Sparse Representations in Neural Networks: A Robust Alternative to Dense Methods

The paper "How Can We Be So Dense? The Benefits of Using Highly Sparse Representations," authored by Subutai Ahmad and Luiz Scheinkman from Numenta, provides a comprehensive analysis of the benefits associated with using sparse representations in neural networks, as an alternative to the prevalent dense representations. The authors propose that while artificial networks primarily rely on dense representations, adopting highly sparse representations can offer several advantages, particularly in terms of robustness and computational efficiency.

Sparse representations, as used in biological networks, have long been of interest in the field of machine learning and neural networks. The authors investigate their suitability in artificial systems by focusing on the robustness to noise and interference. This is a central issue in modern neural networks, which are susceptible to even small input perturbations that may drastically alter their outputs. The authors argue that high-dimensional sparse representations can mitigate this, owing to a reduced volume of operable space around sparse vectors, thus decreasing the probability of random inputs matching stored patterns exponentially with dimensionality.

To empirically validate their theoretical insights, Ahmad and Scheinkman develop computationally efficient sparse networks that incorporate both sparse weights and activations. Their experimental results on MNIST and the Google Speech Command Dataset demonstrate that these sparse networks not only match but in some cases surpass, the robustness and stability of dense networks, all while maintaining competitive accuracy. The key here is leveraging the inherent noise robustness that sparse representations provide due to their high-dimensional sparse nature.

Key Contributions and Findings

Robustness to Noise and Interference: The authors illustrate that highly sparse representations are naturally robust to input noise and interference. This robustness is attributed to the exponential decrease in random match probability as dimensionality increases, a phenomenon analyzed comprehensively for both binary and scalar sparse vectors.
Implementation of Sparse Networks: A novel sparse network architecture is proposed, wherein sparse weights are convolved with sparse inputs. The activation function is replaced by a $k$ -winners layer, which selects the top- $k$ units to remain active, ensuring fixed sparsity in outputs.
Empirical Evaluation and Results: Sparse networks demonstrated superior noise robustness on MNIST and the Google Speech Commands dataset. The variance in performance, especially under noisy conditions, underscores the practical applicability of sparse representations in real-world scenarios where inputs may be imperfect.
Computational Efficiency: Sparsity in both parameters and activations resulted in a reduced number of non-zero computations during inference, suggesting substantial potential improvements in computational efficiency and power usage—though current deep learning frameworks do not yet adequately support these benefits.

Implications for Future Research

The work presented in this paper suggests several avenues for future research. First, it highlights the potential integration of sparse representations in a variety of other neural architectures, including RNNs. It also opens the door for exploration into more refined sparse networks that may incorporate methods like dropout, perhaps optimized differently for sparse systems. Moreover, the combination of sparse networks with pruning approaches could further enhance the advantages of both methodologies, potentially leading to highly efficient yet robust networks.

Moreover, the practical transition of these theoretical models to real-world hardware poses a thrilling challenge. Given the computational gains achievable through sparse representations, dedicated hardware optimizations could significantly advance the deployment of robust, power-efficient AI models.

In conclusion, this paper propounds a substantial case for the adoption of sparsity in neural networks. By showcasing their improved robustness and potential computational benefits, Ahmad and Scheinkman set a precedent for reconsidering dense representation ubiquity in favor of biologically inspired sparse alternatives, thus contributing to a deeper understanding and potential evolution in the design of artificial neural systems.