Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling (2404.07410v2)
Abstract: Downsampling operators break the shift invariance of convolutional neural networks (CNNs) and this affects the robustness of features learned by CNNs when dealing with even small pixel-level shift. Through a large-scale correlation analysis framework, we study shift invariance of CNNs by inspecting existing downsampling operators in terms of their maximum-sampling bias (MSB), and find that MSB is negatively correlated with shift invariance. Based on this crucial insight, we propose a learnable pooling operator called Translation Invariant Polyphase Sampling (TIPS) and two regularizations on the intermediate feature maps of TIPS to reduce MSB and learn translation-invariant representations. TIPS can be integrated into any CNN and can be trained end-to-end with marginal computational overhead. Our experiments demonstrate that TIPS results in consistent performance gains in terms of accuracy, shift consistency, and shift fidelity on multiple benchmarks for image classification and semantic segmentation compared to previous methods and also leads to improvements in adversarial and distributional robustness. TIPS results in the lowest MSB compared to all previous methods, thus explaining our strong empirical results.
- Noise is inside me! generating adversarial perturbations with noise derived from natural filters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 774–775, 2020.
- Why do deep convolutional networks generalize so poorly to small image transformations? Journal of Machine Learning Research, 20(184):1–25, 2019.
- Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Computerized medical imaging and graphics, 43:99–111, 2015.
- Food-101–mining discriminative components with random forests. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, pp. 446–461. Springer, 2014.
- Truly shift-invariant convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3773–3783, 2021.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pp. 801–818, 2018.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR, 2020.
- Adversarial bayesian augmentation for single-source domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11400–11410, 2023.
- Group equivariant convolutional networks. In International conference on machine learning, pp. 2990–2999. PMLR, 2016.
- The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3213–3223, 2016.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Reviving shift equivariance in vision transformers. arXiv preprint arXiv:2306.07470, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
- Exploring the landscape of spatial robustness. In International conference on machine learning, pp. 1802–1811. PMLR, 2019.
- The pascal visual object classes (voc) challenge. International journal of computer vision, 88:303–338, 2010.
- The double-edged sword of implicit bias: Generalization vs. robustness in relu networks. arXiv preprint arXiv:2303.01456, 2023.
- Kunihiko Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4):193–202, 1980.
- Attribute-guided adversarial training for robustness to natural perturbations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 7574–7582, 2021.
- Generalized but not robust? comparing the effects of data modification methods on out-of-domain generalization and adversarial robustness. In Findings of the Association for Computational Linguistics: ACL 2022, pp. 2705–2718, 2022.
- Improving diversity with adversarially learned transformations for domain generalization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 434–443, 2023.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016a.
- Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pp. 630–645. Springer, 2016b.
- Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations, 2018.
- Augmix: A simple data processing method to improve robustness and uncertainty. In International Conference on Learning Representations, 2019.
- Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
- Kvasir-seg: A segmented polyp dataset. In International Conference on Multimedia Modeling, pp. 451–462, 2020.
- Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis, University of Toronto, 2009.
- Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
- Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
- Handwritten digit recognition with a back-propagation network. Advances in neural information processing systems, 2, 1989.
- Deep learning. nature, 521(7553):436–444, 2015.
- Representing and recognizing the visual appearance of materials using three-dimensional textons. International journal of computer vision, 43:29–44, 2001.
- Gift: Learning transformation-invariant dense visual descriptors via group cnns. Advances in Neural Information Processing Systems, 32, 2019.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022, 2021.
- David G Lowe. Object recognition from local scale-invariant features. In Proceedings of the seventh IEEE international conference on computer vision, volume 2, pp. 1150–1157. Ieee, 1999.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- Explicit tradeoffs between adversarial and natural distributional robustness. Advances in Neural Information Processing Systems, 35:38761–38774, 2022.
- Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pp. 722–729. IEEE, 2008.
- Harry Nyquist. Certain topics in telegraph transmission theory. Transactions of the American Institute of Electrical Engineers, 47(2):617–644, 1928.
- Effective rotation-invariant point cnn with spherical harmonics kernels. In 2019 International Conference on 3D Vision (3DV), pp. 47–56. IEEE, 2019.
- Foolbox: A python toolbox to benchmark the robustness of machine learning models. In Reliable Machine Learning in the Wild Workshop, 34th International Conference on Machine Learning, 2017. URL http://arxiv.org/abs/1707.04131.
- Learnable polyphase sampling for shift invariant and equivariant convolutional networks. Advances in Neural Information Processing Systems, 35:35755–35768, 2022.
- Making vision transformers truly shift-equivariant. arXiv preprint arXiv:2305.16316, 2023.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241. Springer, 2015.
- Shiftable multiscale transforms. IEEE transactions on Information Theory, 38(2):587–607, 1992.
- Shift invariance can reduce adversarial robustness. Advances in Neural Information Processing Systems, 34:1858–1871, 2021.
- Id and ood performance are sometimes inversely correlated on real-world datasets. Advances in Neural Information Processing Systems, 36, 2024.
- Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5018–5027, 2017.
- Robust and generalizable visual representation learning via random convolutions. In International Conference on Learning Representations, 2020.
- Dilated residual networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 472–480, 2017.
- Richard Zhang. Making convolutional networks shift-invariant again. In International conference on machine learning, pp. 7324–7334. PMLR, 2019.
- Adversarial perturbation defense on deep neural networks. ACM Computing Surveys (CSUR), 54(8):1–36, 2021.
- Improving equivariance in state-of-the-art supervised depth and normal predictors. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21775–21785, 2023.
- Delving deeper into anti-aliasing in convnets. In BMVC, 2020.
- Delving deeper into anti-aliasing in convnets. International Journal of Computer Vision, 131(1):67–81, 2023.