Towards Dropout Training for Convolutional Neural Networks (1512.00242v1)

Published 1 Dec 2015 in cs.LG, cs.CV, and cs.NE

Abstract: Recently, dropout has seen increasing use in deep learning. For deep convolutional neural networks, dropout is known to work well in fully-connected layers. However, its effect in convolutional and pooling layers is still not clear. This paper demonstrates that max-pooling dropout is equivalent to randomly picking activation based on a multinomial distribution at training time. In light of this insight, we advocate employing our proposed probabilistic weighted pooling, instead of commonly used max-pooling, to act as model averaging at test time. Empirical evidence validates the superiority of probabilistic weighted pooling. We also empirically show that the effect of convolutional dropout is not trivial, despite the dramatically reduced possibility of over-fitting due to the convolutional architecture. Elaborately designing dropout training simultaneously in max-pooling and fully-connected layers, we achieve state-of-the-art performance on MNIST, and very competitive results on CIFAR-10 and CIFAR-100, relative to other approaches without data augmentation. Finally, we compare max-pooling dropout and stochastic pooling, both of which introduce stochasticity based on multinomial distributions at pooling stage.

Authors (2)

Haibing Wu (4 papers)
Xiaodong Gu (62 papers)

Citations (280)

View on Semantic Scholar

Summary

An Analytical Overview of Dropout Training in Convolutional Neural Networks

The paper "Towards Dropout Training for Convolutional Neural Networks" by Haibing Wu and Xiaodong Gu explores the nuanced application of dropout, specifically focusing on its implications within convolutional and pooling layers of deep convolutional neural networks (CNNs). While dropout has been widely acknowledged for its effectiveness in fully-connected layers, this paper explores the less studied area of its integration in other components of CNNs, such as max-pooling layers.

Key Contributions

The research introduces the concept that max-pooling dropout is equivalent to selecting activations based on a multinomial distribution at the training stage. A salient point is the introduction of probabilistic weighted pooling, a proposed method for model averaging during testing that enhances performance over conventional max-pooling. The paper presents robust empirical evidence, demonstrating the superiority of this method in various settings, surpassing traditional max-pooling and scaled max-pooling.

The authors also scrutinize the effects of convolutional dropout, emphasizing its non-trivial role despite the intrinsic parameter efficiency typical of convolutional architectures that naturally counteract over-fitting. Notably, the strategic application of dropout in max-pooling and fully-connected layers leads to state-of-the-art results on the MNIST dataset and competitive performances on CIFAR-10 and CIFAR-100 without relying on data augmentation. Furthermore, an insightful comparison is made between max-pooling dropout and stochastic pooling, both of which employ multinomial distributions, showcasing that max-pooling dropout frequently outperforms stochastic pooling under typical retaining probabilities.

Empirical Findings

The paper conducts comprehensive experiments across several datasets (MNIST, CIFAR-10, and CIFAR-100), adhering to distinct architectural models. Specifically, the architecture defined by a sequence of convolutional layers followed by fully-connected layers was subjected to dropout and evaluated. The results underscore that probabilistic weighted pooling offers superior generalization to test data when contrasted with scaled max-pooling. The empirical evaluation also reveals a U-shaped relationship between retaining probability in max-pooling dropout and performance, indicating that moderate dropout probabilities yield optimal results.

According to the experimental results, employing dropout in convolutions, max-pooling, and fully-connected layers improves test set performance. When these dropout strategies are applied simultaneously, notable reductions in overfitting are observed, although these need careful calibration to prevent over-regularization. Relevantly, the combination of max-pooling and fully-connected dropout was found to be particularly effective, achieving the lowest error rates in various scenarios.

Theoretical and Practical Implications

This work contributes to the theoretical framework by rigorously analyzing dropout's stochastic nature in pooling operations and proposes operational enhancements that can be directly applied to ongoing CNN development. By effectively demonstrating the potential of probabilistic weighted pooling, the paper encourages the reevaluation of past practices within neural network training and model selection. Practically, this research aids in designing improved regularization strategies for CNNs to prevent overfitting, thus refining the model's robustness and generalization capability.

Future Directions

Discussing future directions, this paper sets a fertile ground for exploring advanced dropout techniques aligned with novel neural network architectures. Future work might consider adaptive methods for dynamically tuning dropout rates conducive to various application contexts and datasets. Additionally, exploration into inter-layer dropout strategies could yield further performance enhancements, especially in network types beyond traditional CNNs.

In conclusion, this paper makes a significant contribution to the understanding and optimization of dropout in CNNs, providing a sustained rationale for its expanded adoption across diverse layers beyond traditional confines, bolstering both its theoretical underpinnings and practical benefits in deep learning.

PDF Markdown