CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective (2403.06676v1)
Abstract: Recently, convolutional neural networks (CNNs) with large size kernels have attracted much attention in the computer vision field, following the success of the Vision Transformers. Large kernel CNNs have been reported to perform well in downstream vision tasks as well as in classification performance. The reason for the high-performance of large kernel CNNs in downstream tasks has been attributed to the large effective receptive field (ERF) produced by large size kernels, but this view has not been fully tested. We therefore revisit the performance of large kernel CNNs in downstream task, focusing on the weakly supervised object localization (WSOL) task. WSOL, a difficult downstream task that is not fully supervised, provides a new angle to explore the capabilities of the large kernel CNNs. Our study compares the modern large kernel CNNs ConvNeXt, RepLKNet, and SLaK to test the validity of the naive expectation that ERF size is important for improving downstream task performance. Our analysis of the factors contributing to high performance provides a different perspective, in which the main factor is feature map improvement. Furthermore, we find that modern CNNs are robust to the CAM problems of local regions of objects being activated, which has long been discussed in WSOL. CAM is the most classic WSOL method, but because of the above-mentioned problems, it is often used as a baseline method for comparison. However, experiments on the CUB-200-2011 dataset show that simply combining a large kernel CNN, CAM, and simple data augmentation methods can achieve performance (90.99% MaxBoxAcc) comparable to the latest WSOL method, which is CNN-based and requires special training or complex post-processing. The code is available at https://github.com/snskysk/CAM-Back-Again.
- bethgelab. Toolbox of model-vs-human. https://github.com/bethgelab/model-vs-human.
- GitHub - Evaluating Weakly Supervised Object Localization Methods Right. https://github.com/clovaai/wsolevaluation/issues/42.
- Rethinking class activation mapping for weakly supervised object localization. In Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV, page 618–634, Berlin, Heidelberg, 2020. Springer-Verlag.
- F-cam: Full resolution class activation maps via guided parametric upscaling. In 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3727–3736, 2022.
- Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 839–847, 2018.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848, 2018a.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII, page 833–851, Berlin, Heidelberg, 2018b. Springer-Verlag.
- Lctr: On awakening the local continuity of transformer for weakly supervised object localization. Proceedings of the AAAI Conference on Artificial Intelligence, 36(1):410–418, 2022.
- Evaluating weakly supervised object localization methods right. In Conference on Computer Vision and Pattern Recognition (CVPR), 2020. to appear.
- Attention-based dropout layer for weakly supervised single object localization and semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12):4256–4271, 2021.
- Randaugment: Practical automated data augmentation with a reduced search space. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 3008–3017, 2020.
- Demystify transformers & convolutions in modern image deep networks, 2022.
- Scaling up your kernels to 31×31: Revisiting large kernel design in cnns. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11953–11965, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Weakly supervised object localization via transformer with implicit spatial calibration. In Proceedings of the European conference on computer vision (ECCV), 2022.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
- Deep networks with stochastic depth. In Computer Vision – ECCV 2016, pages 646–661, Cham, 2016. Springer International Publishing.
- Layercam: Exploring hierarchical class activation maps for localization. IEEE Transactions on Image Processing, 30:5875–5888, 2021.
- Bridging the gap between classification and localization for weakly supervised object localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14258–14267, 2022.
- Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60:84 – 90, 2012.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Involution: Inverting the inherence of convolution for visual recognition. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12316–12325, 2021.
- More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity. In The Eleventh International Conference on Learning Representations (ICLR 2023). OpenReview, 2023. 11th International Conference on Learning Representations, ICLR 2023, ICLR 2023 ; Conference date: 01-05-2023 Through 05-05-2023.
- Swin transformer: Hierarchical vision transformer using shifted windows. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9992–10002, Los Alamitos, CA, USA, 2021. IEEE Computer Society.
- A convnet for the 2020s. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11966–11976, 2022.
- Fixing weight decay regularization in adam. CoRR, abs/1711.05101, 2017.
- Understanding the effective receptive field in deep convolutional neural networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, page 4905–4913, Red Hook, NY, USA, 2016. Curran Associates Inc.
- Discriminative sampling of proposals in self-supervised transformers for weakly supervised object localization. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), pages 1–11, 2023.
- Large kernel matters — improve semantic segmentation by global convolutional network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1743–1751, 2017.
- Hornet: Efficient high-order spatial interactions with recursive gated convolutions. Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 618–626, 2017.
- Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In International Conference on Computer Vision (ICCV), 2017.
- High-resolution class activation mapping. In 2019 IEEE International Conference on Image Processing (ICIP), pages 4514–4518, 2019.
- Are convolutional neural networks or transformers more like human vision? ArXiv, abs/2105.07197, 2021.
- Internimage: Exploring large-scale vision foundation models with deformable convolutions. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14408–14419, Los Alamitos, CA, USA, 2023. IEEE Computer Society.
- Shallow feature matters for weakly supervised object localization. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5989–5997, 2021.
- Unsupervised object discovery and co-localization by deep descriptor transformation. Pattern Recognition, 88:113–126, 2019.
- Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6488–6496, 2017.
- Cream: Weakly supervised object localization via class re-activation mapping. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9427–9436, 2022.
- Ts-cam: Token semantic coupled attention map for weakly supervised object localization. IEEE Transactions on Neural Networks and Learning Systems, pages 1–13, 2022.
- Dilated residual networks. In Computer Vision and Pattern Recognition (CVPR), 2017.
- Cutmix: Regularization strategy to train strong classifiers with localizable features. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6022–6031, 2019.
- Rethinking the route towards weakly supervised object localization. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13457–13466, 2020a.
- mixup: Beyond empirical risk minimization. CoRR, abs/1710.09412, 2017.
- Adversarial complementary learning for weakly supervised object localization. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1325–1334, 2018a.
- Self-produced guidance for weakly-supervised object localization. In European Conference on Computer Vision. Springer, 2018b.
- Inter-image communication for weakly supervised localization. In European Conference on Computer Vision. Springer, 2020b.
- Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6877–6886, 2021.
- Random erasing data augmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07):13001–13008, 2020.
- Learning deep features for discriminative localization. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2921–2929, 2016.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.