Filter Grafting for Deep Neural Networks (2001.05868v3)

Published 15 Jan 2020 in cs.CV and cs.LG

Abstract: This paper proposes a new learning paradigm called filter grafting, which aims to improve the representation capability of Deep Neural Networks (DNNs). The motivation is that DNNs have unimportant (invalid) filters (e.g., l1 norm close to 0). These filters limit the potential of DNNs since they are identified as having little effect on the network. While filter pruning removes these invalid filters for efficiency consideration, filter grafting re-activates them from an accuracy boosting perspective. The activation is processed by grafting external information (weights) into invalid filters. To better perform the grafting process, we develop an entropy-based criterion to measure the information of filters and an adaptive weighting strategy for balancing the grafted information among networks. After the grafting operation, the network has very few invalid filters compared with its untouched state, enpowering the model with more representation capacity. We also perform extensive experiments on the classification and recognition tasks to show the superiority of our method. For example, the grafted MobileNetV2 outperforms the non-grafted MobileNetV2 by about 7 percent on CIFAR-100 dataset. Code is available at https://github.com/fxmeng/filter-grafting.git.

Citations (31)

View on Semantic Scholar

Summary

The paper presents a novel filter grafting paradigm that reactivates low-importance filters with external weights instead of pruning them.
It introduces an entropy-based criterion with adaptive weighting to optimally balance grafted information across the network.
Empirical tests on benchmarks like CIFAR-100 show up to 7% improvement in top-1 accuracy, enhancing representation without extra complexity.

Analysis of "Filter Grafting for Deep Neural Networks"

"Filter Grafting for Deep Neural Networks" introduces a novel approach to enhancing the representational capacity of deep neural networks through a technique known as filter grafting. The method addresses the issue of unimportant or invalid filters in neural networks, which traditionally, through filter pruning, are removed to enhance efficiency with minimal performance loss. In contrast, filter grafting aims to reactivate these potentially useful filters by integrating external information, thereby improving accuracy and representation capacity without altering the network's structural integrity.

Key Contributions

Filter Grafting Paradigm: The paper presents a paradigm where rather than pruning, invalid filters, identified by low $l_{1}$ norms, are revitalized. This process is accomplished by grafting weights from external models into these filters. This methodology does not change the existing model architecture and offers a complementary strengthen to its representational capacity.
Entropy-Based Criterion and Adaptive Weighting: The researchers develop an entropy-based criterion to assess filter informativeness over the more conventional $l_{1}$ norm, aiming to more accurately measure the value added by each filter. This is coupled with an adaptive weighting strategy that optimally balances the grafted information's influence across networks.
Empirical Validation: Extensive experimental verification on classification benchmarks like CIFAR-10 and CIFAR-100 demonstrates the method's superiority, with gains as substantial as 7% in top-1 accuracy for MobileNetV2 on CIFAR-100. This is indicative of the grafted networks' enhanced representation capabilities.

Insights and Implications

The implications of this method extend beyond mere performance improvements in accuracy evaluations. By encouraging collaborations between parallel networks to share and graft information, the research suggests an innovative way for network ensembles to improve learning outcomes, challenging the traditional tendency of treating each network's training as isolated. The performance improvements realized, not only in close-set classification but also in open-set recognition tasks such as person re-identification, hint at broad applicability across various domains where deep learning is prevalent.

Moreover, filter grafting, by enhancing network representation capabilities without increasing model complexity, provides a strategy that could be particularly advantageous in resource-constrained environments where deploying larger or more complex models is infeasible.

Future Directions

Considering the particular interest in optimizing computational resources and model efficiency, future exploration of filter grafting could focus on adapting and optimizing this technique for different network architectures, including transformer models and unsupervised learning frameworks. Another research path could involve exploring novel criteria beyond entropy or $l_{1}$ norms to further refine and enhance the grafting process. Additionally, understanding the theoretical foundations and limits of filter grafting in terms of both convergence properties and computational complexity would be valuable enhancements to the existing work.

The research opens several trajectories for enhancing neural network training methodologies that leverage cross-model interactions to skeletally sustain higher capacities without detrimental computational costs. This has promising implications for advancing machine learning systems' efficiency and reliability, particularly in real-world applications demanding high accuracy and minimal latency.

PDF Markdown

Related Papers

GitHub

GitHub - fxmeng/filter-grafting: Filter Grafting for Deep Neural Networks(CVPR 2020) (140 stars)