Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning with 3D rotations, a hitchhiker's guide to SO(3) (2404.11735v2)

Published 17 Apr 2024 in cs.LG, cs.CV, and cs.RO

Abstract: Many settings in machine learning require the selection of a rotation representation. However, choosing a suitable representation from the many available options is challenging. This paper acts as a survey and guide through rotation representations. We walk through their properties that harm or benefit deep learning with gradient-based optimization. By consolidating insights from rotation-based learning, we provide a comprehensive overview of learning functions with rotation representations. We provide guidance on selecting representations based on whether rotations are in the model's input or output and whether the data primarily comprises small angles.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Loss it right: Euclidean and riemannian metrics in learning-based visual odometry. In ISR Europe 2023; 56th International Symposium on Robotics, pp.  107–111. VDE, 2023.
  2. URL https://marctenbosch.com/quaternions/code.htm.
  3. Bosch, M. T. Let’s remove quaternions from every 3d engine, 2020b. URL https://marctenbosch.com/quaternions/.
  4. Brégier, R. Deep regression on manifolds: a 3d rotation case study. In 2021 International Conference on 3D Vision (3DV), pp.  166–174. IEEE, 2021.
  5. Modern koopman theory for dynamical systems. arXiv preprint arXiv:2102.12086, 2021.
  6. Sharp minima can generalize for deep nets. In International Conference on Machine Learning, pp.  1019–1028. PMLR, 2017.
  7. Position Information in Transformers: An Overview. Computational Linguistics, 48(3):733–763, 09 2022. ISSN 0891-2017. doi: 10.1162/coli_a_00445. URL https://doi.org/10.1162/coli_a_00445.
  8. Euler, L. Du mouvement de rotation des corps solides autour d’un axe variable. Mémoires de l’académie des sciences de Berlin, pp.  154–193, 1765.
  9. A loss curvature perspective on training instability in deep learning. arXiv preprint arXiv:2110.04369, 2021.
  10. Grassia, F. S. Practical parameterization of rotations using the exponential map. Journal of graphics tools, 3(3):29–48, 1998.
  11. Rotation averaging. International journal of computer vision, 103:267–305, 2013.
  12. Huynh, D. Q. Metrics for 3d rotations: Comparison and analysis. Journal of Mathematical Imaging and Vision, 35:155–164, 2009.
  13. An analysis of svd for deep rotation estimation. Advances in Neural Information Processing Systems, 33:22554–22565, 2020.
  14. Constructive approximation of discontinuous functions by neural networks. Neural Processing Letters, 27:209–226, 2008.
  15. LLC., M. Cmu graphics lab motion capture database, 2024. URL http://mocap.cs.cmu.edu/.
  16. Understanding plasticity in neural networks. In International Conference on Machine Learning, pp.  23190–23211. PMLR, 2023.
  17. Macdonald, A. Linear and geometric algebra. Alan Macdonald, 2010.
  18. Mäkinen, J. Rotation manifold so (3) and its tangential vectors. Computational Mechanics, 42:907–919, 2008.
  19. Learning rotations. Mathematical Methods in the Applied Sciences, 2022.
  20. Learning 3-d object orientation from images. In 2009 IEEE International conference on robotics and automation, pp.  794–800. IEEE, 2009.
  21. Stuelpnagel, J. On the parametrization of the three-dimensional rotation group. SIAM review, 6(4):422–430, 1964.
  22. Modified rodrigues parameters: an efficient representation of orientation in 3d vision and graphics. Journal of Mathematical Imaging and Vision, 60:422–442, 2018.
  23. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.  5026–5033, 2012. doi: 10.1109/IROS.2012.6386109.
  24. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. doi: 10.1038/s41592-019-0686-2.
  25. Densefusion: 6d object pose estimation by iterative dense fusion. 2019.
  26. PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes. In Robotics: Science and Systems (RSS), 2018.
  27. The essential order of approximation for neural networks. Science in China Series F: Information Sciences, 47:97–112, 2004.
  28. Simultaneous lp-approximation order for neural networks. Neural Networks, 18(7):914–923, 2005.
  29. Approximation capabilities of neural odes and invertible residual networks. In International Conference on Machine Learning, pp.  11086–11095. PMLR, 2020.
  30. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  5745–5753, 2019.
Citations (10)

Summary

  • The paper demonstrates that high-dimensional rotation representations significantly enhance gradient-based learning in 3D machine learning tasks.
  • It evaluates methods like Euler angles, quaternions, and SVD, highlighting issues such as gimbal lock, singularities, and computational trade-offs.
  • Empirical tests in point cloud rotation and 6D pose estimation underscore the potential of tailored and hybrid approaches for robust ML performance.

Learning with 3D Rotations: A Guide to SO(3) Representations for Machine Learning

Introduction to Rotation Representations in Machine Learning

This paper provides a rigorous examination of various rotation representations in the context of machine learning applications involving 3D rotations. The focus is on how these representations affect the performance of learning models, particularly those using gradient-based optimization techniques. The authors review several rotation representations, delineating scenarios where each can be applied beneficially, and underline theoretical as well as empirical considerations.

Overview of Common Rotation Representations

  • 2D/3D Representations: Various rotation representation methods are discussed:
    • Angle-based representations (e.g., Euler angles) suffer from issues like Gimbal lock and do not generally support efficient gradient-based learning due to discontinuities and singularities.
    • Quaternion and exponential coordinates, commonly used due to their computational efficiency and lesser susceptibility to singularities, still face challenges related to double-cover, where two quaternion values represent the same rotation.
    • Higher-dimensional representations such as those using the Gram-Schmidt process or singular value decomposition (SVD) offer continuous mappings and avoid many pitfalls of lower-dimensional approaches.

Implications for Machine Learning

  • Gradient-Based Learning: Representations that involve discontinuities or singularities can severely impact the performance of gradient-based learning algorithms. In cases where rotations are part of the model inputs or outputs, the choice of representation becomes crucial.
  • Dimensionality and Continuity: Higher-dimensional representations, despite their increased computational overhead, provide smoother landscapes for optimization and are less prone to local minima associated with discontinuities in rotation space.

Experimental Validation

Through rigorous experiments, the paper validates the theoretical insights concerning the efficacy of various rotation representations:

  • Rotation Estimation from Point Clouds: Higher-dimensional representations, particularly using SVD, outperformed others by providing continuous mappings and supporting more effective gradient propagation.
  • 6D Object Pose Estimation from Images: Similar improvements were observed with SVD and other high-dimensional methods, with notable robustness against the practical variations in image-based datasets.

Conclusions and Future Directions

The exploration of rotation representations in this paper highlights several key points:

  • Prefer High-Dimensional Representations: For tasks involving 3D rotations within machine learning frameworks, especially those dependent on gradient-based optimization, high-dimensional representations prove to be more advantageous.
  • Need for Tailored Solutions: Depending on the specific requirements of a task (such as sensitivity to computational overhead or real-time constraints), different rotation representations may be preferred. The empirical insights provided can serve as a guideline for making these choices.
  • Potential for Hybrid Approaches: In scenarios where no single representation offers a clear advantage, hybrid methods that combine multiple representations may be worth exploring.

Given the depth of analysis and empirical evidence presented, this paper serves as a comprehensive guide for researchers and practitioners dealing with 3D rotations in machine learning tasks, providing a clear framework for selecting and implementing the most suitable rotation representations.