Spherical CNNs

Published 30 Jan 2018 in cs.LG and stat.ML | (1801.10130v3)

Abstract: Convolutional Neural Networks (CNNs) have become the method of choice for learning problems involving 2D planar images. However, a number of problems of recent interest have created a demand for models that can analyze spherical images. Examples include omnidirectional vision for drones, robots, and autonomous cars, molecular regression problems, and global weather and climate modelling. A naive application of convolutional networks to a planar projection of the spherical signal is destined to fail, because the space-varying distortions introduced by such a projection will make translational weight sharing ineffective. In this paper we introduce the building blocks for constructing spherical CNNs. We propose a definition for the spherical cross-correlation that is both expressive and rotation-equivariant. The spherical correlation satisfies a generalized Fourier theorem, which allows us to compute it efficiently using a generalized (non-commutative) Fast Fourier Transform (FFT) algorithm. We demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical CNNs applied to 3D model recognition and atomization energy regression.

Abstract PDF Upgrade to Chat

Citations (852)

View on Semantic Scholar

Summary

The paper proposes spherical CNNs that enable rotation-equivariant learning through novel spherical convolution operations.
It overcomes computational challenges using generalized FFT and non-commutative harmonic analysis to handle spherical signals efficiently.
The framework demonstrates robust performance in tasks like rotated digit classification, 3D shape recognition, and molecular energy regression.

Overview of Spherical CNNs

The paper, "Spherical CNNs" by Taco S. Cohen, Mario Geiger, Jonas Köhler, and Max Welling, presents a detailed framework and implementation for Convolutional Neural Networks (CNNs) tailored to spherical data. The primary motivation stems from the inadequacy of traditional planar CNNs to effectively process spherical signals due to inherent distortions and the lack of rotational weight sharing when such signals are projected onto a plane. Applications for spherical CNNs include omnidirectional vision for autonomous systems, molecular regression problems, and climate modeling.

Spherical Convolutional Networks

The authors introduce spherical convolutional networks (Spherical CNNs), leveraging the principles of rotation-equivariant operations. Unlike planar convolutions, spherical convolutions replace translational operations with rotational ones. This involves defining a new type of cross-correlation that is adept at handling spherical signals, ensuring that the resulting models are robust to rotations in three-dimensional space.

Key distinctions are highlighted:

Transformation and Group Theory: The transition from plane symmetries governed by translations (a subgroup operating over the plane) to 3D rotations governed by the SO(3) group, a three-dimensional manifold.
Spherical Grids and Computational Challenges: The absence of uniform spherical grids and the necessity for interpolation during rotations introduce significant computational complexity. The computational order for naive implementations of spherical correlations is $O(n^6)$ .

The authors mitigate computational inefficiencies by employing techniques from non-commutative harmonic analysis, notably the generalized Fourier transform (GFT) optimized via Fast Fourier Transform (FFT) algorithms. This transformation is integral to their method, allowing the implementation of the spherical and SO(3) correlations efficiently.

Mathematical Framework

The mathematical rigor of this paper includes the derivation of spherical correlation, drawing analogies with classical planar correlation. The definition hinges on the new concept of rotating filters:

Unit Sphere (S²) and Rotations: Functions on the sphere are parameterized using spherical coordinates, and functions on $SO(3)$ are expressed through ZYZ-Euler angles.
Rotation Operators: The rotation of signals is defined via $L_R f$ , ensuring the rotated function maps consistently with group theory properties.
Correlation Equivariance: They prove the correlation operation maintains equivariance, a critical property ensuring model outputs remain consistent under arbitrary rotations of the input.

Implementation and Experiments

The authors provide practical solutions to implement the generalized FFT for spherical and rotational data, embedding their techniques in PyTorch for efficient and memory-sensitive coding.

Numerical Stability and Equivariance

The paper rigorously evaluates the numerical implementation, confirming the theoretical equivariance properties of their discretized spherical CNN layers. Through empirical benchmarks, they show $\Delta$ errors remain controlled, affirming the stability even for deeper networks.

Applications

The presented Spherical CNNs demonstrate their efficacy across three primary tasks:

Rotated MNIST on the Sphere: The spherical CNN delivers superior performance over traditional planar CNNs in handling rotated digit classification, showcasing its robustness to rotational variations.
3D Shape Recognition: Applied to the SHREC17 dataset, spherical CNNs perform competitively with specialized task-specific methods, indicating their generalization capability in understanding 3D shapes subjected to arbitrary rotations.
Molecular Energy Regression: For the QM7 dataset, spherical CNNs, leveraging Coulomb matrix representations with rotational symmetries, surpass kernel-based and traditional Machine Learning approaches in predicting molecular atomization energies.

Implications and Future Work

Practical Implications:

Omnidirectional Vision: The research opens doors to more accurate omnidirectional vision systems in autonomous vehicles, drones, and robotics, where 360-degree sensor data is increasingly prevalent.
Scientific Computing: Enhanced accuracy in molecular simulations and climate models due to the inherent ability of spherical CNNs to handle rotational symmetries effectively.

Theoretical Implications:

Equivariant Neural Networks: The work expands the field of group-equivariant neural networks, addressing the continuous, non-commutative group $SO(3)$ .
Extensions: Future research could focus on expanding the framework to handle volumetric tasks via $SO(3)\times \mathbb{R}^3$ convolutions or developing steerable CNNs for vector fields on the sphere, allowing applications in global weather modeling.

Conclusion

This paper serves as an essential contribution to deep learning applications involving non-Euclidean domains, especially spherical signals. The robust theoretical framework backed by practical implementation and empirical validation establishes spherical CNNs not just as a mathematical novelty but as a viable tool for real-world applications requiring rotational invariance. Future work should aim at enhancing computational efficiencies further and exploring broader applications, ensuring spherical CNNs' place in the deep learning toolkit for complex data domains.