Conditional Computation in Neural Networks for faster models (1511.06297v2)

Published 19 Nov 2015 in cs.LG

Abstract: Deep learning has become the state-of-art tool in many applications, but the evaluation and training of deep models can be time-consuming and computationally expensive. The conditional computation approach has been proposed to tackle this problem (Bengio et al., 2013; Davis & Arel, 2013). It operates by selectively activating only parts of the network at a time. In this paper, we use reinforcement learning as a tool to optimize conditional computation policies. More specifically, we cast the problem of learning activation-dependent policies for dropping out blocks of units as a reinforcement learning problem. We propose a learning scheme motivated by computation speed, capturing the idea of wanting to have parsimonious activations while maintaining prediction accuracy. We apply a policy gradient algorithm for learning policies that optimize this loss function and propose a regularization mechanism that encourages diversification of the dropout policy. We present encouraging empirical results showing that this approach improves the speed of computation without impacting the quality of the approximation.

Citations (300)

View on Semantic Scholar

Summary

The paper demonstrates that using reinforcement learning for input-dependent unit activation significantly accelerates neural network computations.
It introduces a deterministic dropout-like mechanism that selectively activates nodes to reduce computational load while maintaining accuracy.
Empirical results on datasets such as CIFAR-10 show up to a 5.7× speed improvement, highlighting the method's effectiveness for limited-resource environments.

Conditional Computation in Neural Networks for Faster Models

This paper addresses the computational challenges inherent in training large-scale neural networks, specifically focusing on enhancing both training and inference speeds. The authors present a method utilizing conditional computation within neural networks. The approach involves selectively activating certain parts of the network, as dictated by the input data, to reduce computational load while maintaining prediction accuracy.

Methodology

The methodology employed involves formulating the problem as a reinforcement learning challenge, where the goal is to learn policies that determine which network units to activate or deactivate based on the input. This deterministic selection mechanism for unit activation is akin to dropout; however, unlike standard dropout, the drop decisions are input-dependent, optimizing computational efficiency.

The paper describes a policy gradient approach based on REINFORCE to optimize these activation policies. Through a Bernoulli distribution, each node within a network layer is associated with a probability of being activated. Regularization terms are introduced to encourage a distribution of dropout rates that maintain a prescribed sparsity rate across the neural network.

Numerical Results

The empirical results presented in this paper demonstrate that conditional computation can improve computational efficiency in neural networks significantly without degrading accuracy. Specifically, applied to standard datasets like MNIST, CIFAR-10, and SVHN, models using conditional computation achieved substantial speedups compared to baseline networks, especially in environments where computational resources are limited, such as single-core CPUs. For example, a neural network using the proposed conditional computation approach achieved up to a 5.7 times improvement in computational speed on CIFAR-10 without sacrificing accuracy.

Implications

The implications of these findings are both practical and theoretical. Practically, this work opens the door for deploying complex neural networks in resource-constrained environments, such as mobile devices, where computational power and energy efficiency are critical. Theoretically, it contributes to a broader understanding of how dynamic neural network topology adjustments can benefit computational efficiency.

Future research could expand on the usage of more efficient policy search algorithms to further optimize computation times, and the application of conditional computation principles to different types of neural network architectures, such as convolutional networks. Additionally, exploring the coupling of input-driven conditional computation with other forms of data-dependent processing, such as attention mechanisms, could yield further insights.

Overall, this paper provides a well-founded approach to accelerating neural network computations through conditional computation strategies and sets a precedent for future work in this domain. The contributions are grounded in sophisticated reinforcement learning techniques, showcasing their potential for optimizing computational pathways in deep learning models.

PDF Markdown