Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
117 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Making Vision Transformers Truly Shift-Equivariant (2305.16316v2)

Published 25 May 2023 in cs.CV

Abstract: For computer vision, Vision Transformers (ViTs) have become one of the go-to deep net architectures. Despite being inspired by Convolutional Neural Networks (CNNs), ViTs' output remains sensitive to small spatial shifts in the input, i.e., not shift invariant. To address this shortcoming, we introduce novel data-adaptive designs for each of the modules in ViTs, such as tokenization, self-attention, patch merging, and positional encoding. With our proposed modules, we achieve true shift-equivariance on four well-established ViTs, namely, Swin, SwinV2, CvT, and MViTv2. Empirically, we evaluate the proposed adaptive models on image classification and semantic segmentation tasks. These models achieve competitive performance across three different datasets while maintaining 100% shift consistency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. A. Azulay and Y. Weiss. Why do deep convolutional networks generalize so poorly to small image transformations? JMLR, 2019.
  2. Geometric deep learning: going beyond euclidean data. IEEE SPM, 2017.
  3. A. Chaman and I. Dokmanic. Truly shift-invariant convolutional neural networks. In Proc. CVPR, 2021.
  4. T. Cohen and M. Welling. Group equivariant convolutional networks. In Proc. ICML, 2016.
  5. Gauge equivariant convolutional networks and the icosahedral CNN. In Proc. ICML, 2019.
  6. Spherical CNNs. In Proc. ICLR, 2018.
  7. M. Contributors. MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark. https://github.com/open-mmlab/mmsegmentation, 2020.
  8. Randaugment: Practical automated data augmentation with a reduced search space. In Proc. CVPR workshop, 2020.
  9. Gauge equivariant mesh CNNs: Anisotropic convolutions on geometric graphs. In Proc. ICLR, 2021.
  10. Convolutional neural networks on graphs with fast localized spectral filtering. In Proc. NeurIPS, 2016.
  11. ImageNet: A large-scale hierarchical image database. In Proc. CVPR, 2009.
  12. Reviving shift equivariance in vision transformers. arXiv preprint arXiv:2306.07470, 2023.
  13. An image is worth 16x16 words: Transformers for image recognition at scale. In Proc. ICLR, 2021.
  14. Multiscale vision transformers. In Proc. CVPR, 2021.
  15. A survey on vision transformer. IEEE TPAMI, 2022.
  16. Deep models of interactions across sets. In Proc. ICML, 2018.
  17. Deep residual learning for image recognition. In Proc. CVPR, 2016.
  18. Transformers in vision: A survey. ACM CSUR, 2022.
  19. T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In Proc. ICLR, 2017.
  20. Image to sphere: Learning equivariant features for efficient pose prediction. In Proc. ICLR, 2023.
  21. Clebsch–Gordan Nets: a fully Fourier space spherical convolutional neural network. In Proc. NeurIPS, 2018.
  22. Learning multiple layers of features from tiny images. 2009.
  23. ImageNet classification with deep convolutional neural networks. In Proc. NeurIPS, 2012.
  24. MViTv2: Improved multiscale vision transformers for classification and detection. In Proc. CVPR, 2022.
  25. Network in network. arXiv preprint arXiv:1312.4400, 2013.
  26. PIC: permutation invariant critic for multi-agent deep reinforcement learning. In Proc. CORL, 2020.
  27. Semantic tracklets: An object-centric representation for visual multi-agent reinforcement learning. In Proc. IROS, 2021a.
  28. Swin transformer: Hierarchical vision transformer using shifted windows. In Proc. ICCV, 2021b.
  29. Swin transformer v2: Scaling up capacity and resolution. In Proc. CVPR, 2022.
  30. Invariant and equivariant graph networks. In Proc. ICLR, 2019.
  31. On learning sets of symmetric elements. In Proc. ICML, 2020.
  32. Alias-free convnets: Fractional shift invariance via polynomial activations. In Proc. CVPR, 2023.
  33. SpeqNets: Sparsity-aware permutation-equivariant graph networks. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors, Proc. ICML, 2022.
  34. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proc. CVPR, 2017.
  35. Equivariance through parameter-sharing. In Proc. ICML, 2017a.
  36. Deep learning with sets and point clouds. In Proc. ICLR workshop, 2017b.
  37. Learnable polyphase sampling for shift invariant and equivariant convolutional networks. In Proc. NeurIPS, 2022.
  38. Attentive group equivariant convolutional networks. In Proc. ICML, 2020.
  39. D. W. Romero and S. Lohit. Learning partial equivariances from data. In Proc. NeurIPS, 2022.
  40. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. CVPR, 2018.
  41. M. Shakerinava and S. Ravanbakhsh. Equivariant networks for pixelized spheres. In Proc. ICML, 2021.
  42. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE SPM, 2013.
  43. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. ICLR, 2015.
  44. Relaxing equivariance constraints with non-stationary continuous filters. In Proc. NeurIPS, 2022.
  45. Attention is all you need. In Proc. NeurIPS, 2017.
  46. Building deep equivariant capsule networks. In Proc. ICLR, 2020.
  47. Foundations of signal processing. Cambridge University Press, 2014.
  48. M. Weiler and G. Cesa. General E(2)-equivariant steerable CNNs. In Proc. NeurIPS, 2019.
  49. CvT: Introducing convolutions to vision transformers. In Proc. CVPR, 2021.
  50. Unified perceptual parsing for scene understanding. In Proc. ECCV, 2018.
  51. Chirality nets for human pose regression. In Proc. NeurIPS, 2019a.
  52. Diverse generation for multi-agent sports games. In Proc. CVPR, 2019b.
  53. Equivariance discovery by learned parameter-sharing. In Proc. AISTATS, 2022.
  54. Deep sets. In Proc. NeurIPS, 2017.
  55. R. Zhang. Making convolutional networks shift-invariant again. In Proc. ICML, 2019.
  56. Scene parsing through ADE20K dataset. In Proc. CVPR, 2017.
  57. Delving deeper into anti-aliasing in convnets. In Proc. BMVC, 2020.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com