Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data (2002.12880v3)

Published 25 Feb 2020 in stat.ML and cs.LG

Abstract: The translation equivariance of convolutional layers enables convolutional neural networks to generalize well on image problems. While translation equivariance provides a powerful inductive bias for images, we often additionally desire equivariance to other transformations, such as rotations, especially for non-image data. We propose a general method to construct a convolutional layer that is equivariant to transformations from any specified Lie group with a surjective exponential map. Incorporating equivariance to a new group requires implementing only the group exponential and logarithm maps, enabling rapid prototyping. Showcasing the simplicity and generality of our method, we apply the same model architecture to images, ball-and-stick molecular data, and Hamiltonian dynamical systems. For Hamiltonian systems, the equivariance of our models is especially impactful, leading to exact conservation of linear and angular momentum.

Authors (4)

Marc Finzi (25 papers)
Samuel Stanton (14 papers)
Pavel Izmailov (26 papers)
Andrew Gordon Wilson (133 papers)

Citations (301)

View on Semantic Scholar

Summary

Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data

The work presented in this paper explores the complex requirements for extending convolutional neural networks (CNNs) beyond their intrinsic translation-equivariance to gain equivariance properties with respect to any specified Lie group. This requirement is particularly relevant for tasks involving non-image data that naturally adhere to specific symmetries, such as rotations or more complex transformations.

Approach and Methodology

The researchers propose a new type of convolutional layer, termed "LieConv," which is equivariant to transformations from any Lie group provided the group is path-connected via a surjective exponential map. This framework simplifies the augmentation of CNNs with equivariances to arbitrary continuous data types by requiring only the implementation of group exponential and logarithm maps. The LieConv framework is versatile as it utilizes a single model architecture across various data types, namely images, molecular structures, and Hamiltonian systems, each of which benefits from distinct symmetry properties.

The key insight is that for a wide spectrum of continuous groups, known as Lie groups, the equivariance can be extended by parameterizing the convolutional kernel through a neural network defined on the group itself. This generalizes the concept of group convolutions to any group that possesses an analytic or numerically tractable exponential map.

Numerical Results and Strong Claims

The LieConv model achieves competitive to state-of-the-art performance across different datasets characterized by varying symmetries. On the image classification dataset RotMNIST, for example, LieConv demonstrates an error rate that matches or surpasses other contemporary methods. Furthermore, when analyzing molecular properties using the QM9 dataset, LieConv achieves lower mean absolute error rates compared to existing techniques, such as SchNet, for some tasks.

A noteworthy application is in modeling the Hamiltonian dynamics of physical systems, where the LieConv model preserves physical quantities like energy and momentum exactly due to the inherent symmetries encoded into the convolutional layers. This exact conservation leads to more accurate predictive models of dynamical systems compared to other approaches like Hamiltonian ODE graph networks (HOGN).

Implications and Future Developments

The implications of this research stretch beyond mere performance benchmarks. By effectively capturing the intrinsic symmetries in a dataset, the LieConv framework offers a method to construct highly efficient and generalized models applicable across diverse fields, from computer vision to quantum chemistry and even computational physics. The LieConv architecture serves as a robust template for future developments in constructing equivariant models across more complex input domains.

Looking forward, the integration of such flexible equivariant architectures is likely to grow, facilitating the rapid prototyping and testing of models that need to conform to specific symmetry constraints of data. Furthermore, this research paves the way for a new domain of neural networks capable of autonomously learning these symmetries, further broadening the practical applications and tools available in AI-driven analyses of complex systems. Given the theoretical framework established by this paper, future research could delve into deeper understanding and exploitation of symmetries unseen in classical applications or within emerging data modalities.

PDF Markdown

Related Papers

Tweets

https://twitter.com/sameQCU/status/1817828123922342348