Learning to Transform for Generalizable Instance-wise Invariance (2309.16672v3)
Abstract: Computer vision research has long aimed to build systems that are robust to spatial transformations found in natural data. Traditionally, this is done using data augmentation or hard-coding invariances into the architecture. However, too much or too little invariance can hurt, and the correct amount is unknown a priori and dependent on the instance. Ideally, the appropriate invariance would be learned from data and inferred at test-time. We treat invariance as a prediction problem. Given any image, we use a normalizing flow to predict a distribution over transformations and average the predictions over them. Since this distribution only depends on the instance, we can align instances before classifying them and generalize invariance across classes. The same distribution can also be used to adapt to out-of-distribution poses. This normalizing flow is trained end-to-end and can learn a much larger range of transformations than Augerino and InstaAug. When used as data augmentation, our method shows accuracy and robustness gains on CIFAR 10, CIFAR10-LT, and TinyImageNet.
- Mental rotation and orientation-dependence in shape recognition. Cognitive psychology, 21(2):233–282, 1989.
- Learning from one example through shared densities on transforms. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662), volume 1, pages 464–471 vol.1, 2000.
- Mental rotation and visual familiarity. Perception & Psychophysics, 37(5):429–439, 1985.
- Mental rotation of three-dimensional objects. Science, 171(3972):701–703, 1971.
- Small in-distribution changes in 3d perspective and lighting fool both cnns and transformers, 2021.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Do image classifiers generalize across time? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9661–9669, 2021.
- NTS Board. Collision between vehicle controlled by developmental automated driving system and pedestrian. Transportation Safety Board, Washington, DC, USA, HAR19-03, 2019.
- Geometric deep learning: Grids, groups, graphs, geodesics, and gauges, 2021.
- Object recognition with gradient-based learning. In Shape, contour and grouping in computer vision, pages 319–345. Springer, 1999.
- Kunihiko Fukushima. Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural networks, 1(2):119–130, 1988.
- Grounding inductive biases in natural images: invariance stems from variations in data. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 19566–19579. Curran Associates, Inc., 2021.
- Do deep networks transfer invariances across classes? 2022.
- Learning invariances in neural networks, 2020.
- Instance-specific augmentation: Capturing local invariances, 2022.
- What should not be contrastive in contrastive learning. In International Conference on Learning Representations, 2021.
- Invariance learning in deep neural networks with differentiable laplace approximations, 2022.
- Chronometric studies of the rotation of mental images. In Visual information processing, pages 75–176. Elsevier, 1973.
- Mental rotation and perceptual uprightness. Perception & Psychophysics, 24(6):529–533, 1978.
- When and how convolutional neural networks generalize to out-of-distribution category–viewpoint combinations. Nature Machine Intelligence, 4(2):146–153, Feb 2022.
- Pulmonary nodule detection in CT scans with equivariant cnns. Medical Image Anal., 55:15–26, 2019.
- Roto-translation equivariant convolutional networks: Application to histopathology image analysis. Medical Image Analysis, 68:101849, 2021.
- Dense steerable filter cnns for exploiting rotational symmetry in histology images. IEEE Trans. Medical Imaging, 39(12):4124–4136, 2020.
- Exploiting cyclic symmetry in convolutional neural networks. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, pages 1889–1898, 2016.
- Deepsphere: Efficient spherical convolutional neural network with healpix sampling for cosmological applications. Astron. Comput., 27:130–146, 2019.
- Cormorant: Covariant molecular neural networks. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 14510–14519, 2019.
- Equivariant message passing for the prediction of tensorial properties and molecular spectra. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, pages 9377–9388, 2021.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- On the generalization of equivariance and convolution in neural networks to the action of compact groups. In International Conference on Machine Learning, ICML, 2018.
- A general theory of equivariant cnns on homogeneous spaces. In Advances in Neural Information Processing Systems, pages 9142–9153, 2019.
- Generalizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data. arXiv preprint arXiv:2002.12880, 2020.
- Residual pathway priors for soft equivariance constraints. In Advances in Neural Information Processing Systems, volume 34, 2021.
- Learning augmentation distributions using transformed risk minimization. arXiv preprint arXiv:2111.08190, 2021.
- A kernel theory of modern data augmentation. ICML, 97:1528–1537, 2019.
- Equivariance with learned canonicalization functions. In NeurIPS 2022 Workshop on Symmetry and Geometry in Neural Representations, 2022.
- Spatial transformer networks. In Advances in Neural Information Processing Systems, volume 28, 2015.
- Probabilistic spatial transformer networks. In The 38th Conference on Uncertainty in Artificial Intelligence, 2022.
- Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 22(57):1–64, 2021.
- Density estimation using real NVP. In International Conference on Learning Representations, 2017.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 770–778, 2016.
- normflows: A PyTorch Package for Normalizing Flows. arXiv preprint arXiv:2302.12014, 2023.
- Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2017.
- Fixing weight decay regularization in adam. 2017.