Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EraseReLU: A Simple Way to Ease the Training of Deep Convolution Neural Networks (1709.07634v2)

Published 22 Sep 2017 in cs.CV

Abstract: For most state-of-the-art architectures, Rectified Linear Unit (ReLU) becomes a standard component accompanied with each layer. Although ReLU can ease the network training to an extent, the character of blocking negative values may suppress the propagation of useful information and leads to the difficulty of optimizing very deep Convolutional Neural Networks (CNNs). Moreover, stacking layers with nonlinear activations is hard to approximate the intrinsic linear transformations between feature representations. In this paper, we investigate the effect of erasing ReLUs of certain layers and apply it to various representative architectures following deterministic rules. It can ease the optimization and improve the generalization performance for very deep CNN models. We find two key factors being essential to the performance improvement: 1) the location where ReLU should be erased inside the basic module; 2) the proportion of basic modules to erase ReLU; We show that erasing the last ReLU layer of all basic modules in a network usually yields improved performance. In experiments, our approach successfully improves the performance of various representative architectures, and we report the improved results on SVHN, CIFAR-10/100, and ImageNet. Moreover, we achieve competitive single-model performance on CIFAR-100 with 16.53% error rate compared to state-of-the-art.

Citations (15)

Summary

We haven't generated a summary for this paper yet.