Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units (1603.05201v2)

Published 16 Mar 2016 in cs.LG and cs.CV

Abstract: Recently, convolutional neural networks (CNNs) have been used as a powerful tool to solve many problems of machine learning and computer vision. In this paper, we aim to provide insight on the property of convolutional neural networks, as well as a generic method to improve the performance of many CNN architectures. Specifically, we first examine existing CNN models and observe an intriguing property that the filters in the lower layers form pairs (i.e., filters with opposite phase). Inspired by our observation, we propose a novel, simple yet effective activation scheme called concatenated ReLU (CRelu) and theoretically analyze its reconstruction property in CNNs. We integrate CRelu into several state-of-the-art CNN architectures and demonstrate improvement in their recognition performance on CIFAR-10/100 and ImageNet datasets with fewer trainable parameters. Our results suggest that better understanding of the properties of CNNs can lead to significant performance improvement with a simple modification.

Citations (491)

View on Semantic Scholar

Summary

The paper shows that leveraging CReLU reduces redundancy in CNN filters by capturing both positive and negative phase information.
The proposed CReLU activation significantly improves CNN expressiveness, boosting accuracy on CIFAR-10, CIFAR-100, and ImageNet datasets.
Experimental evaluations reveal that CReLU models achieve comparable or better performance with fewer parameters and improved regularization.

Insights into CNNs Enhanced by Concatenated ReLUs

The paper "Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units" presents an analytical and methodological enhancement to Convolutional Neural Networks (CNNs) through the innovative use of Concatenated ReLUs (CReLU). This enhancement leverages an intriguing observation about traditional CNN architectures and proposes a novel activation scheme to improve their performance.

Observations and Hypothesis

The authors examine existing CNN models and uncover that filters in lower layers tend to form pairs with opposite phases. Based on this observation, they hypothesize that these lower layers learn redundant filters to capture both positive and negative phase information. This redundancy suggests inefficiencies that the authors aim to address with their new activation function.

Concatenated ReLU Activation

Inspired by their findings, the authors introduce the Concatenated Rectified Linear Unit (CReLU) activation scheme. This method duplicates input linear responses after convolution, negates them, and concatenates both before applying the standard ReLU non-linearity. CReLU maintains information from both positive and negative input phases without saturating, potentially reducing redundancy among filters.

Theoretical Analysis

The paper provides a theoretical analysis of the reconstruction property associated with CReLUs. By preserving both phase directions, CReLUs enhance the expressiveness and generalizability of CNN features. This improved representational capacity is backed by demonstrating its substantial impact on reconstruction properties compared to traditional ReLU activations.

Experimental Evaluation

The integration of CReLUs into prevalent CNN architectures was empirically evaluated across datasets like CIFAR-10, CIFAR-100, and ImageNet. CReLU showed notable improvements in recognition performance over baseline ReLU models. Particularly, it achieved better accuracy on CIFAR datasets and yielded reduced parameter usage on ImageNet without compromising performance. Furthermore, the CReLU models displayed an unexpected regularization effect, indicating less overfitting despite increased model capacity.

Numerical Results

Results on CIFAR-10/100 demonstrated that simply exchanging ReLU with CReLU increased accuracy significantly. On CIFAR-100, CReLU models improved classification performance while reducing parameters by half, with CReLU + half model performing comparably to baseline models. On ImageNet, best-performing CReLU models exceeded the baseline, indicating its strong potential for large-scale implementations.

Implications and Future Directions

Theoretical and empirical findings suggest that CReLUs can effectively capitalize on phase information, enhancing model efficiency and representational capabilities. The paper's insights promise advancements in constructing more compact and effective CNNs. Future research could explore extensions of CReLU, possibly integrating with other non-linearities or varied architecture designs, further leveraging the identified phase redundancy in CNNs.

In summary, the introduction of Concatenated ReLUs signifies a noteworthy step towards refining CNN architectures, highlighting how nuanced architectural modifications can yield significant improvements in efficiency and performance.

PDF Markdown

Related Papers

Tweets

https://twitter.com/MLwagie/status/1787707566338019383