WeightNet: Revisiting the Design Space of Weight Networks (2007.11823v2)

Published 23 Jul 2020 in cs.CV

Abstract: We present a conceptually simple, flexible and effective framework for weight generating networks. Our approach is general that unifies two current distinct and extremely effective SENet and CondConv into the same framework on weight space. The method, called WeightNet, generalizes the two methods by simply adding one more grouped fully-connected layer to the attention activation layer. We use the WeightNet, composed entirely of (grouped) fully-connected layers, to directly output the convolutional weight. WeightNet is easy and memory-conserving to train, on the kernel space instead of the feature space. Because of the flexibility, our method outperforms existing approaches on both ImageNet and COCO detection tasks, achieving better Accuracy-FLOPs and Accuracy-Parameter trade-offs. The framework on the flexible weight space has the potential to further improve the performance. Code is available at https://github.com/megvii-model/WeightNet.

Citations (98)

View on Semantic Scholar

Summary

The paper introduces WeightNet, a unified framework that fuses SENet and CondConv to enhance CNN design with superior accuracy and efficiency trade-offs.
It employs grouped fully-connected layers to generate convolution weights from learned attention vectors, offering flexible control through hyperparameters M and G.
Validated on ImageNet and COCO with ShuffleNetV2 and ResNet50, WeightNet achieves up to 5.7% top-1 accuracy improvements while maintaining efficient FLOPs and parameters.

An Analysis of "WeightNet: Revisiting the Design Space of Weight Networks"

The paper introduces a novel framework termed "WeightNet" aimed at enhancing the design flexibility and performance of convolutional neural networks (CNNs) through a unified approach. Specifically, the paper revisits the paradigm of weight-generating networks by presenting WeightNet, which is positioned to unify the distinct methodologies of SENet and CondConv within a cohesive framework. This overarching approach offers a structured perspective on optimizing the weight space for CNN architectures, promising improvements in performance trade-offs concerning accuracy, FLOPs, and parameters.

Overview of Methods

WeightNet operates by generalizing the functionality of SENet and CondConv, two previously disparate methods known for their parameter efficiency and dynamic adaptability. SENet achieves its robust performance through channel-wise feature recalibration using an attention mechanism that dynamically weights channel significance. CondConv, on the other hand, leverages multiple expert weights, selected based on sample-specific input, to offer a conditional parametrization of weights. WeightNet aims to synthesize these methodologies by eliding their distinctions and extending the design space of weight networks.

The core of the WeightNet framework is a straightforward architecture composed entirely of (grouped) fully-connected layers. This design choice allows the network to generate convolutions weights directly from a learned attention vector, bypassing the intermediate operations typical in SENet and CondConv. By incorporating an additional grouped fully-connected layer, WeightNet enables flexible control over the representation capacity and complexity of CNNs, providing two new hyperparameters $M$ and $G$ to adjust input size and group number, respectively.

Experimental Evaluation

The paper extensively evaluates WeightNet on classification tasks using the ImageNet dataset and object detection tasks on the COCO dataset, leveraging ShuffleNetV2 and ResNet50 architectures. WeightNet demonstrated superior performance in both tasks by enhancing the trade-off between accuracy and computational efficiency. Notably, it achieved substantial improvements in top-1 accuracy across varying model sizes while maintaining comparable or reduced FLOPs and parameter counts.

For instance, on ShuffleNetV2 (0.5×), WeightNet offered a top-1 accuracy improvement of up to 5.7% without exceeding the baseline FLOPs. Further tests demonstrated WeightNet's robustness against existing attention mechanisms like SE and CondConv, prevailing with a consistent advantage in performance metrics.

Implications and Future Directions

The implications of WeightNet are significant, suggesting that neural network design can explore broader weight spaces without substantial computational penalties. This capability is particularly relevant for applications demanding resource efficiency like mobile and embedded systems, where maximizing model accuracy under stringent hardware constraints is critical.

The theoretical underpinnings of WeightNet also invite further exploration. The paper posits that viewing SENet and CondConv as part of a continuum of weight space design ushers in potential for novel configurations that could further bolster CNN performance. Future research may explore more complex structures within the WeightNet framework to explore sparsity and weight decomposition techniques, potentially optimizing model training and inference phases.

In conclusion, WeightNet offers a compelling enhancement to the design space of CNN architectures. By bridging the gap between existing methodologies and introducing a more unified approach, the framework holds promise for significant advances in both the theoretical insights and practical applications of neural networks in resource-constrained environments.