Does resistance to style-transfer equal Global Shape Bias? Measuring network sensitivity to global shape configuration (2310.07555v3)
Abstract: Deep learning models are known to exhibit a strong texture bias, while human tends to rely heavily on global shape structure for object recognition. The current benchmark for evaluating a model's global shape bias is a set of style-transferred images with the assumption that resistance to the attack of style transfer is related to the development of global structure sensitivity in the model. In this work, we show that networks trained with style-transfer images indeed learn to ignore style, but its shape bias arises primarily from local detail. We provide a \textbf{Disrupted Structure Testbench (DiST)} as a direct measurement of global structure sensitivity. Our test includes 2400 original images from ImageNet-1K, each of which is accompanied by two images with the global shapes of the original image disrupted while preserving its texture via the texture synthesis program. We found that \textcolor{black}{(1) models that performed well on the previous cue-conflict dataset do not fare well in the proposed DiST; (2) the supervised trained Vision Transformer (ViT) lose its global spatial information from positional embedding, leading to no significant advantages over Convolutional Neural Networks (CNNs) on DiST. While self-supervised learning methods, especially mask autoencoder significantly improves the global structure sensitivity of ViT. (3) Improving the global structure sensitivity is orthogonal to resistance to style-transfer, indicating that the relationship between global shape structure and local texture detail is not an either/or relationship. Training with DiST images and style-transferred images are complementary, and can be combined to train network together to enhance the global shape sensitivity and robustness of local features.} Our code will be hosted in github: https://github.com/leelabcnbc/DiST
- Does the brain’s ventral visual pathway compute object shape? Trends in Cognitive Sciences, 26(12):1119–1132, 2022.
- Perception of an object’s global shape is best described by a model of skeletal structure in human infants. elife, 11:e74943, 2022.
- Deep learning models fail to capture the configural nature of human shape perception. Iscience, 25(9), 2022.
- Francis Brochu. Increasing shape bias in imagenet-trained networks using transfer learning and domain-adversarial methods. arXiv preprint arXiv:1907.12892, 2019.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 9650–9660, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- A systematic review of robustness in deep learning for computer vision: Mind the gap? arXiv preprint arXiv:2112.00639, 2021.
- Intrinsic images by clustering. In Computer graphics forum, volume 31, pp. 1415–1424. Wiley Online Library, 2012.
- Texture synthesis using convolutional neural networks. Advances in neural information processing systems, 28, 2015.
- Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018.
- Partial success in closing the gap between human and machine vision. Advances in Neural Information Processing Systems, 34:23885–23899, 2021.
- Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA), pp. 80–89. IEEE, 2018.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16000–16009, 2022.
- Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017.
- Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision, pp. 1501–1510, 2017.
- Adversarial examples are not bugs, they are features. Advances in neural information processing systems, 32, 2019.
- Self-supervised intrinsic image decomposition. Advances in neural information processing systems, 30, 2017.
- Mobilenetv3. Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization, pp. 125–144, 2021.
- FFCV: Accelerating training by removing data bottlenecks. In Computer Vision and Pattern Recognition (CVPR), 2023. https://github.com/libffcv/ffcv/.
- Emergence of shape bias in convolutional neural networks through activation sparsity. Advances in Neural Information Processing Systems (Oral Presentation), -:–, 2023.
- Surfgen: Adversarial 3d shape synthesis with explicit surface discriminators. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16238–16248, 2021.
- Vision transformers are robust learners. In Proceedings of the AAAI conference on Artificial Intelligence, volume 36, pp. 2071–2081, 2022.
- Opening the black box: the promise and limitations of explainable machine learning in cardiology. Canadian Journal of Cardiology, 38(2):204–213, 2022.
- Perceptual categorization of cat and dog silhouettes by 3-to 4-month-old infants. Journal of experimental child psychology, 79(1):78–94, 2001a.
- Developmental change in form categorization in early infancy. British Journal of Developmental Psychology, 19(2):207–218, 2001b.
- You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
- Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
- Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
- Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2820–2828, 2019.
- Deit iii: Revenge of the vit. In European Conference on Computer Vision, pp. 516–533. Springer, 2022.
- Convnet vs transformer, supervised vs clip: Beyond imagenet accuracy, 2024.
- Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16133–16142, 2023.
- Self-training with noisy student improves imagenet classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10687–10698, 2020.
- Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500, 2017.
- Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pp. 818–833. Springer, 2014.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.