DCNN: Dual Cross-current Neural Networks Realized Using An Interactive Deep Learning Discriminator for Fine-grained Objects
Abstract: Accurate classification of fine-grained images remains a challenge in backbones based on convolutional operations or self-attention mechanisms. This study proposes novel dual-current neural networks (DCNN), which combine the advantages of convolutional operations and self-attention mechanisms to improve the accuracy of fine-grained image classification. The main novel design features for constructing a weakly supervised learning backbone model DCNN include (a) extracting heterogeneous data, (b) keeping the feature map resolution unchanged, (c) expanding the receptive field, and (d) fusing global representations and local features. Experimental results demonstrated that using DCNN as the backbone network for classifying certain fine-grained benchmark datasets achieved performance advantage improvements of 13.5--19.5% and 2.2--12.9%, respectively, compared to other advanced convolution or attention-based fine-grained backbones.
- Verdié, Y. et al. Cromo: Cross-modal learning for monocular depth estimation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3927–3937 (IEEE, 2022).
- Noc-rek: novel object captioning with retrieved vocabulary from external knowledge. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 17979–17987 (IEEE, 2022).
- Zheng, Z. et al. Ultra-high-definition image dehazing via multi-guided bilateral learning. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16180–16189 (IEEE, 2021).
- Visual tracking via dynamic memory networks. \JournalTitleIEEE Transactions on Pattern Analysis and Machine Intelligence 43, 360–374 (2019).
- A survey of recent advances in cnn-based fine-grained visual categorization. In 2020 IEEE 20th International Conference on Communication Technology (ICCT), 1377–1384 (2020).
- Very deep convolutional networks for large-scale image recognition. \JournalTitlearXiv preprint arXiv:1409.1556 (2014).
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778 (2016).
- Liu, Z. et al. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11976–11986 (2022).
- Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1646–1654 (2016).
- Sarkar, R. et al. Outfittransformer: Outfit representations for fashion recommendation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2263–2267 (2022).
- Han, K. et al. A survey on vision transformer. \JournalTitleIEEE Transactions on Pattern Analysis and Machine Intelligence 45, 87–110 (2022).
- Volo: Vision outlooker for visual recognition. \JournalTitleIEEE Transactions on Pattern Analysis and Machine Intelligence 45, 6575–6586 (2022).
- Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. \JournalTitlearXiv preprint arXiv:2010.11929 (2020).
- Yuan, L. et al. Tokens-to-token vit: Training vision transformers from scratch on imagenet. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 558–567 (2021).
- Touvron, H. et al. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, 10347–10357 (PMLR, 2021).
- Liu, Z. et al. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10012–10022 (2021).
- Han, K. et al. A survey on vision transformer. \JournalTitleIEEE Transactions on Pattern Analysis and Machine Intelligence 45, 87–110 (2023).
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1251–1258 (2017).
- Kang, J. et al. An improved 3d human pose estimation model based on temporal convolution with gaussian error linear units. In 2022 8th International Conference on Virtual Reality (ICVR), 21–32 (IEEE, 2022).
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, 448–456 (pmlr, 2015).
- Layer normalization. \JournalTitlearXiv preprint arXiv:1607.06450 (2016).
- A partially linear framework for massive heterogeneous data. \JournalTitleAnnals of Statistics 44, 1400 (2016).
- Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2921–2929 (2016).
- Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5012–5021 (2019).
- Ji, R. et al. Attention convolutional binary neural tree for fine-grained visual categorization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10468–10477 (2020).
- Guo, M.-H. et al. Attention mechanisms in computer vision: A survey. \JournalTitleComputational Visual Media 8, 331–368 (2022).
- Attention mechanisms and deep learning for machine vision: A survey of the state of the art. \JournalTitlearXiv preprint arXiv:2106.07550 (2021).
- When shift operation meets vision transformer: An extremely simple alternative to attention mechanism. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, 2423–2430 (2022).
- Feature fusion vision transformer for fine-grained visual categorization. \JournalTitlearXiv preprint arXiv:2107.02341 (2021).
- Hu, Y. et al. Rams-trans: Recurrent attention multi-scale transformer for fine-grained image recognition. In Proceedings of the 29th ACM International Conference on Multimedia, 4239–4248 (2021).
- He, J. et al. Transfg: A transformer architecture for fine-grained recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, 852–860 (2022).
- Generalized cross entropy loss for training deep neural networks with noisy labels. \JournalTitleAdvances in Neural Information Processing Systems 31 (2018).
- A discriminative feature learning approach for deep face recognition. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14, 499–515 (Springer, 2016).
- Russakovsky, O. et al. Imagenet large scale visual recognition challenge. \JournalTitleInternational Journal of Computer Vision 115, 211–252 (2015).
- Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 722–729 (IEEE, 2008).
- Combining weakly and webly supervised learning for classifying food images. \JournalTitlearXiv preprint arXiv:1712.08730 (2017).
- Novel dataset for fine-grained image categorization. In Proceedings of the First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Springs, USA (2011).
- Fine-grained visual-textual representation learning. \JournalTitleIEEE Transactions on Circuits and Systems for Video Technology 30, 520–531 (2019).
- Tolstikhin, I. O. et al. Mlp-mixer: An all-mlp architecture for vision. \JournalTitleAdvances in Neural Information Processing Systems 34, 24261–24272 (2021).
- Touvron, H. et al. Resmlp: Feedforward networks for image classification with data-efficient training. \JournalTitleIEEE Transactions on Pattern Analysis and Machine Intelligence 45, 5314–5321 (2022).
- Patches are all you need? \JournalTitlearXiv preprint arXiv:2201.09792 (2022).
- Designing network design spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10428–10436 (2020).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.