Equivariant Single View Pose Prediction Via Induced and Restricted Representations (2307.03704v1)
Abstract: Learning about the three-dimensional world from two-dimensional images is a fundamental problem in computer vision. An ideal neural network architecture for such tasks would leverage the fact that objects can be rotated and translated in three dimensions to make predictions about novel images. However, imposing SO(3)-equivariance on two-dimensional inputs is difficult because the group of three-dimensional rotations does not have a natural action on the two-dimensional plane. Specifically, it is possible that an element of SO(3) will rotate an image out of plane. We show that an algorithm that learns a three-dimensional representation of the world from two dimensional images must satisfy certain geometric consistency properties which we formulate as SO(2)-equivariance constraints. We use the induced and restricted representations of SO(2) on SO(3) to construct and classify architectures which satisfy these geometric consistency constraints. We prove that any architecture which respects said consistency constraints can be realized as an instance of our construction. We show that three previously proposed neural architectures for 3D pose prediction are special cases of our construction. We propose a new algorithm that is a learnable generalization of previously considered methods. We test our architecture on three pose predictions task and achieve SOTA results on both the PASCAL3D+ and SYMSOL pose estimation tasks.
- David Marr. Vision: A computational investigation into the human representation and processing of visual information. MIT press, 2010.
- Multiple View Geometry in Computer Vision. Cambridge University Press, 2 edition, 2004. doi: 10.1017/CBO9780511811685.
- A survey of structure from motion, 2017. URL https://arxiv.org/abs/1701.08493.
- Geometric deep learning: Grids, groups, graphs, geodesics, and gauges, 2021. URL https://arxiv.org/abs/2104.13478.
- Steerable cnns. axriv, 2016a. doi: 10.48550/ARXIV.1612.08498. URL https://arxiv.org/abs/1612.08498.
- On the generalization of equivariance and convolution in neural networks to the action of compact groups, 2018. URL https://arxiv.org/abs/1802.03690.
- Intertwiners between induced representations (with applications to the theory of equivariant neural networks), 2018a. URL https://arxiv.org/abs/1803.10743.
- A wigner-eckart theorem for group equivariant convolution kernels, 2020. URL https://arxiv.org/abs/2010.10952.
- Group equivariant convolutional networks. axriv, 2016b. doi: 10.48550/ARXIV.1602.07576. URL https://arxiv.org/abs/1602.07576.
- Image to icosahedral projection for SO(3)SO3\mathrm{SO}(3)roman_SO ( 3 ) object reasoning from single-view images, 2022. URL https://arxiv.org/abs/2207.08925.
- Equivariant multi-view networks, 2019a. URL https://arxiv.org/abs/1904.00993.
- Image to sphere: Learning equivariant features for efficient pose prediction. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=_2bDpAtr7PI.
- Beyond pascal: A benchmark for 3d object detection in the wild. In IEEE Winter Conference on Applications of Computer Vision, pages 75–82, 2014. doi: 10.1109/WACV.2014.6836101.
- Implicit-pdf: Non-parametric representation of probability distributions on the rotation manifold, 2022.
- Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995, 1995.
- Self-attention with relative position representations. arXiv preprint arXiv:1803.02155, 2018.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
- Tensor field networks: Rotation- and translation-equivariant neural networks for 3d point clouds, 2018.
- The surprising effectiveness of equivariant models in domains with latent symmetry, 2022. URL https://arxiv.org/abs/2211.09231.
- General e(2)𝑒2e(2)italic_e ( 2 )-equivariant steerable cnns, 2021.
- 3d steerable cnns: Learning rotationally equivariant features in volumetric data, 2018a.
- Spherical cnns, 2018b.
- Explorations in homeomorphic variational auto-encoding, 2018.
- Learning symmetric embeddings for equivariant world models. arXiv preprint arXiv:2204.11371, 2022.
- Cross-domain 3d equivariant image embeddings. In International Conference on Machine Learning, pages 1812–1822. PMLR, 2019b.
- Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
- Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199, 2017.
- Reconstructing continuous distributions of 3d protein structure from cryo-em images, 2020.
- Viewpoints and keypoints, 2015.
- A mixed classification-regression framework for 3d pose estimation from 2d images, 2018.
- On the continuity of rotation representations in neural networks, 2020.
- Romain Brégier. Deep regression on manifolds: A 3d rotation case study, 2021.
- Spherical regression: Learning viewpoints, surface normals and 3d rotations on n-spheres, 2019.
- Deep bingham networks: Dealing with uncertainty and ambiguity in pose estimation, 2020.
- Deep directional statistics: Pose estimation with uncertainty quantification, 2018.
- A laplace-inspired distribution on SO(3) for probabilistic rotation estimation. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Mvetq8DO05O.
- Harmonic Analysis on Finite Groups: Representation Theory, Gelfand Pairs and Markov Chains. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2008. doi: 10.1017/CBO9780511619823.
- Learning steerable filters for rotation equivariant cnns, 2018b.
- Nonlinearities in steerable so(2)-equivariant cnns, 2021.
- Gauge equivariant mesh cnns: Anisotropic convolutions on geometric graphs, 2021.
- A functional approach to rotation equivariant non-linearities for tensor field networks. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13169–13178, 2021. doi: 10.1109/CVPR46437.2021.01297.
- Unified fourier-based kernel and nonlinearity design for equivariant networks on homogeneous spaces, 2022.
- e3nn: Euclidean neural networks, 2022.
- Tom Leinster. Basic category theory, 2016.
- Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views, 2015.
- Pytorch: An imperative style, high-performance deep learning library, 2019.
- Probabilistic regression with huber distributions, 2021.
- Deep orientation uncertainty learning based on a bingham loss. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=ryloogSKDS.
- On the symmetries of the synchronization problem in cryo-EM: Multi-frequency vector diffusion maps on the projective plane. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=owDcdLGgEm.
- Invariant slot attention: Object discovery with slot-centric reference frames, 2023.
- Object scene representation transformer, 2022.
- A. Zee. Group Theory in a Nutshell for Physicists. In a Nutshell. Princeton University Press, 2016. ISBN 9780691162690. URL https://books.google.com/books?id=FWkujgEACAAJ.
- J. P. Serre. Groupes finis, 2005. URL https://arxiv.org/abs/math/0503154.
- Induced representations and Mackey theory, page 399–425. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2018. doi: 10.1017/9781316856383.012.
- 3d shapenets: A deep representation for volumetric shapes, 2015.
- Sparse steerable convolutions: An efficient learning of SE(3)-equivariant features for estimation and tracking of object poses in 3d space. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=Fa-w-10s7YQ.
- Steerable partial differential operators for equivariant neural networks, 2022.
- Deconstructing self-supervised monocular reconstruction: The design decisions that matter, 2022.
- Monocular depth estimation using diffusion models, 2023.
- Dense depth estimation in monocular endoscopy with self-supervised learning methods, 2019.
- Photometric single-view dense 3d reconstruction in endoscopy, 2022.
- M4depth: Monocular depth estimation for autonomous vehicles in unseen environments, 2022.
- Reducing so(3) convolutions to so(2) for efficient equivariant gnns, 2023.
- Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, 1989. ISSN 0893-6080. doi: https://doi.org/10.1016/0893-6080(89)90020-8. URL https://www.sciencedirect.com/science/article/pii/0893608089900208.