Papers
Topics
Authors
Recent
Search
2000 character limit reached

RePr: Improved Training of Convolutional Filters

Published 18 Nov 2018 in cs.CV and cs.LG | (1811.07275v3)

Abstract: A well-trained Convolutional Neural Network can easily be pruned without significant loss of performance. This is because of unnecessary overlap in the features captured by the network's filters. Innovations in network architecture such as skip/dense connections and Inception units have mitigated this problem to some extent, but these improvements come with increased computation and memory requirements at run-time. We attempt to address this problem from another angle - not by changing the network structure but by altering the training method. We show that by temporarily pruning and then restoring a subset of the model's filters, and repeating this process cyclically, overlap in the learned features is reduced, producing improved generalization. We show that the existing model-pruning criteria are not optimal for selecting filters to prune in this context and introduce inter-filter orthogonality as the ranking criteria to determine under-expressive filters. Our method is applicable both to vanilla convolutional networks and more complex modern architectures, and improves the performance across a variety of tasks, especially when applied to smaller networks.

Citations (55)

Summary

Improved Training of Filters in Convolutional Neural Networks

Recent advancements in Convolutional Neural Networks (CNNs) have been marked by substantial innovations in network architecture and optimization strategies, leading to top-tier performances in numerous image classification tasks. However, many trained CNNs contain redundant filters, evidenced by the prevalent use of filter pruning for model compression without significant detriment to accuracy. The paper "RePr: Improved Training of Convolutional Filters" addresses the core inefficiency of overlapping filter functionalities not through structural changes in the network architecture but through a novel training methodology.

Summary of Proposed Method: RePr

The authors introduce a training technique named RePr, which focuses on periodically pruning and reinitializing a subset of filters during the training process. The key hypothesis is that cyclic removal and restoration of filters reduce redundancy among features and improve generalization by encouraging orthogonal learning directions among filters. This contrasts with static model pruning techniques, which often yield networks incapable of matching the performance of larger models that were pruned to a similar size.

Inter-filter Orthogonality

Central to the method is the metric of inter-filter orthogonality, presented as a new criterion for ranking filter importance. The paper asserts that traditional metrics for filter pruning, such as norm-based and Taylor approximation methods, are suboptimal in the context of periodic filter pruning and reinitialization. Instead, inter-filter orthogonality better identifies redundant filters, thus fostering model expressiveness and minimizing overlapping feature representations that can lead to overfitting and poor generalization.

Empirical Results and Analysis

The RePr method demonstrates consistent improvements in performance across various architectures and tasks, notably with smaller CNNs and datasets like CIFAR-10 and CIFAR-100. By employing RePr, these models exhibited substantial gains in test accuracy compared to vanilla training approaches. The effectiveness of RePr was validated through experiments with both straightforward networks and more advanced architectures, such as ResNet and DenseNet. The results highlight the potential of RePr to yield significant performance enhancements without the need for architecture modification.

Practical Implications and Future Directions

On a practical level, the RePr method offers a promising avenue for enhancing the efficacy of CNNs, particularly in resource-constrained scenarios where model size and computational budget are primary concerns. The technique’s ability to increase model performance without inflating structure complexity is particularly appealing for deployment in environments where computational efficiency is paramount.

Looking forward, RePr holds intriguing possibilities for enhancing training regimens across other classes of neural networks beyond CNNs. While the primary application has focused on image classification tasks, the idea of periodically introducing randomness and forcing orthogonality in feature learning could benefit other domains, such as sequence-to-sequence models in text and speech processing. Furthermore, integration with existing and emerging techniques, including knowledge distillation and advanced regularization methods, may enhance the benefits of the proposed approach.

In conclusion, RePr represents an informative step towards the optimization of convolutional filters in neural networks through innovative training strategies. The resultant models achieving improved generalization, along with reduced redundancy, underscore the significance of rethinking traditional training paradigms.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.