Revisiting Dynamic Convolution via Matrix Decomposition (2103.08756v1)

Published 15 Mar 2021 in cs.CV

Abstract: Recent research in dynamic convolution shows substantial performance boost for efficient CNNs, due to the adaptive aggregation of K static convolution kernels. It has two limitations: (a) it increases the number of convolutional weights by K-times, and (b) the joint optimization of dynamic attention and static convolution kernels is challenging. In this paper, we revisit it from a new perspective of matrix decomposition and reveal the key issue is that dynamic convolution applies dynamic attention over channel groups after projecting into a higher dimensional latent space. To address this issue, we propose dynamic channel fusion to replace dynamic attention over channel groups. Dynamic channel fusion not only enables significant dimension reduction of the latent space, but also mitigates the joint optimization difficulty. As a result, our method is easier to train and requires significantly fewer parameters without sacrificing accuracy. Source code is at https://github.com/liyunsheng13/dcd.

Citations (57)

View on Semantic Scholar

Summary

The paper introduces a dynamic convolution decomposition framework that leverages matrix factorization to reduce parameters and simplify training.
The paper employs a three-step process—compress, fuse, and expand—to transform dynamic convolution into a compact and efficient model.
The paper demonstrates competitive accuracy on ResNet and MobileNetV2 while significantly easing training complexity.

Revisiting Dynamic Convolution via Matrix Decomposition

The paper "Revisiting Dynamic Convolution via Matrix Decomposition" presents a novel approach to improving the efficiency of dynamic convolution in Convolutional Neural Networks (CNNs). Dynamic convolution has gained attention for its ability to enhance CNN performance by adaptively aggregating multiple static convolution kernels based on the input. However, it encounters significant challenges, such as increased parameter count and the complexity of joint optimization of dynamic attention and static convolution kernels.

Key Contributions and Methodology

The authors address these limitations through a matrix decomposition lens, proposing a dynamic convolution decomposition (DCD) framework. This framework replaces the conventional dynamic attention over channel groups in a higher-dimensional latent space with a more efficient dynamic channel fusion approach. The matrix decomposition reformulation leverages Singular Value Decomposition (SVD) to express the dynamic convolution in terms of static kernels and residual matrices. The primary innovation lies in the use of a dynamic channel fusion mechanism, which significantly reduces the latent space dimensionality, thus making the model more compact and easier to train without compromising accuracy.

The proposed DCD involves a three-step process:

Compressing the input into a lower-dimensional latent space using a static matrix.
Applying a full dynamic channel fusion matrix to dynamically fuse the channels in this latent space.
Expanding the fused representation back to the output space using another static matrix.

This workflow effectively mitigates the challenges posed by conventional dynamic convolution methods by reducing the number of parameters and improving training efficiency.

Experimental Evaluation and Results

The experimentation spans implementations on popular networks such as ResNet and MobileNetV2 across multiple configurations. DCD boasts a reduced parameter count while achieving competitive or superior accuracy compared to vanilla dynamic convolution methods. For instance, DCD matches the performance of DY-Conv but with fewer parameters and without the requirement of tuning additional constraints like temperature in softmax functions, indicating simplified training dynamics.

Ablation studies further delineate the contribution of various DCD components, demonstrating that both dynamic channel fusion and channel-wise attention significantly bolster performance. Additionally, tests show that even a sparse dynamic residual configuration continues to outperform static baselines, emphasizing the potent representational capacity of dynamic channel fusion.

Implications and Future Directions

This work holds practical implications for the deployment of efficient yet powerful CNNs in resource-constrained environments by minimizing model complexity without detriment to performance. Theoretically, it paves the way for further exploration of matrix decomposition techniques within other dynamic neural network architectures. Future research might explore extending DCD concepts to other types of layers or network architectures and investigating the integration with additional dynamic elements such as activations or residual connections.

The paper's insights into latent space dimensionality and channel interaction dynamics also suggest possible avenues for synergistic interactions with other network optimization techniques, potentially leading to more robust, compact, and efficient CNN models.

PDF Markdown

Related Papers

GitHub

GitHub - liyunsheng13/dcd: official code for dynamic convolution decomposition (130 stars)