Dynamic Shuffle: An Efficient Channel Mixture Method (2310.02776v1)
Abstract: The redundancy of Convolutional neural networks not only depends on weights but also depends on inputs. Shuffling is an efficient operation for mixing channel information but the shuffle order is usually pre-defined. To reduce the data-dependent redundancy, we devise a dynamic shuffle module to generate data-dependent permutation matrices for shuffling. Since the dimension of permutation matrix is proportional to the square of the number of input channels, to make the generation process efficiently, we divide the channels into groups and generate two shared small permutation matrices for each group, and utilize Kronecker product and cross group shuffle to obtain the final permutation matrices. To make the generation process learnable, based on theoretical analysis, softmax, orthogonal regularization, and binarization are employed to asymptotically approximate the permutation matrix. Dynamic shuffle adaptively mixes channel information with negligible extra computation and memory occupancy. Experiment results on image classification benchmark datasets CIFAR-10, CIFAR-100, Tiny ImageNet and ImageNet have shown that our method significantly increases ShuffleNets' performance. Adding dynamic generated matrix with learnable static matrix, we further propose static-dynamic-shuffle and show that it can serve as a lightweight replacement of ordinary pointwise convolution.
- Batch-shaping for learning conditional channel gated networks, in: Proceedings of the International Conference on Learning Representations.
- Adaptive neural networks for efficient inference, in: Proceedings of the International Conference on Machine Learning, pp. 527–536.
- Dynamic ReLU, in: Proceedings of the European Conference on Computer Vision, pp. 351–367.
- A downsampled variant of ImageNet as an alternative to the CIFAR datasets. arXiv preprint arXiv:1707.08819 .
- Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv preprint arXiv:1602.02830 .
- ImageNet: A large-scale hierarchical image database, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255.
- Global sparse momentum sgd for pruning very deep neural networks, in: Proceedings of the Advances in Neural Information Processing Systems, pp. 6379–6391.
- ResRep: Lossless cnn pruning via decoupling remembering and forgetting, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4510–4520.
- Fire together wire together: A dynamic pruning approach with self-supervised mask prediction, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12454–12463.
- Learned step size quantization, in: Proceedings of the International Conference on Learning Representations.
- Optimal brain compression: A framework for accurate post-training quantization and pruning. arXiv preprint arXiv:2208.11580 .
- Dynamic channel pruning: Feature boosting and suppression, in: Proceedings of the International Conference on Learning Representations.
- Dynamic neural networks: A survey. arXiv preprint arXiv:2102.04906 .
- Data-free ensemble knowledge distillation for privacy-conscious multimedia model compression, in: Shen, H.T., Zhuang, Y., Smith, J.R., Yang, Y., César, P., Metze, F., Prabhakaran, B. (Eds.), Proceedings of the ACM International Conference on Multimedia, pp. 1803–1811.
- Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.
- Filter pruning via feature discrimination in deep neural networks, in: Proceedings of the European Conference on Computer Vision, pp. 245–261.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 .
- MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 .
- Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250 .
- Squeeze-and-excitation networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141.
- Densely connected convolutional networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261–2269.
- Speeding up convolutional neural networks with low rank expansions, in: Proceedings of the British Machine Vision Conference, pp. 1–13.
- Training CNNs with selective allocation of channels, in: Proceedings of the International Conference on Machine Learning, pp. 3080–3090.
- Pruning deep convolutional neural networks architectures with evolution strategy. Information Sciences 552, 29–47.
- Learning multiple layers of features from tiny images. Technical report, University of Toronto .
- SCWC: Structured channel weight sharing to compress convolutional neural networks. Information Sciences 587, 82–96.
- Pruning filters for efficient convnets, in: Proceedings of the International Conference on Learning Representations.
- Hard sample matters a lot in zero-shot quantization, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24417–24426.
- Revisiting random channel pruning for neural network compression, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 191–201.
- Micronet: Improving image recognition with extremely low flops, in: Proceedings of the IEEE/CVF International conference on computer vision, pp. 468–477.
- Towards compact cnns via collaborative compression, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6438–6447.
- Runtime neural pruning, in: Proceedings of the Advances in Neural Information Processing Systems, pp. 2178–2188.
- Hrank: Filter pruning using high-rank feature map, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1526–1535.
- Discrimination-aware network pruning for deep model compression. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 4035–4051.
- AutoShuffleNet: Learning permutation matrices via an exact lipschitz continuous penalty in deep convolutional neural networks, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 608–616.
- WeightNet: Revisiting the design space of weight networks, in: Proceedings of the European Conference on Computer Vision, pp. 776–792.
- Activate or not: Learning customized activation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8028–8038.
- ShuffleNet v2: Practical guidelines for efficient cnn architecture design, in: Proceedings of the European Conference on Computer Vision, pp. 116–131.
- Dynamic group convolution for accelerating convolutional neural networks, in: Proceedings of the European Conference on Computer Vision, pp. 138–155.
- Customizing a teacher for feature distillation. Information Sciences 640, 119024.
- Cross-modal hash retrieval based on semantic multiple similarity learning and interactive projection matrix learning. Information Sciences , 119571.
- Manifold regularized dynamic network pruning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5018–5028.
- Neural network quantization in federated learning at the edge. Information Sciences 575, 417–436.
- ECA-Net: Efficient channel attention for deep convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11531–11539.
- Objective-hierarchy based large-scale evolutionary algorithm for improving joint sparsity-compression of neural network. Information Sciences 640, 119095.
- Fully learnable group convolution for acceleration of deep neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9041–9050.
- SkipNet: Learning dynamic routing in convolutional networks, in: Proceedings of the European Conference on Computer Vision, pp. 409–424.
- Aggregated residual transformations for deep neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5987–5995.
- TRP: Trained rank pruning for efficient deep neural networks, in: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 977–983.
- Condconv: Conditionally parameterized convolutions for efficient inference, in: Proceedings of the Advances in Neural Information Processing Systems, pp. 1307–1318.
- Learning low-rank deep neural networks via singular vector orthogonality regularization and singular value sparsification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp. 2899–2908.
- ZeroQuant: Efficient and affordable post-training quantization for large-scale transformers, in: Proceedings of the Advances in Neural Information Processing Systems, pp. 27168–27183.
- Understanding straight-through estimator in training activation quantized neural nets, in: Proceedings of the International Conference on Learning Representations.
- Data-free knowledge distillation via feature exchange and activation region constraint, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24266–24275.
- On compressing deep models by low rank and sparse decomposition, in: Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 67–76.
- ShuffleNet: An extremely efficient convolutional neural network for mobile devices, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856.
- Differentiable learning-to-group channels via groupable convolutional neural networks, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3541–3550.
- Learning deep features for discriminative localization, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929.
- DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 .