On the Universality of Invariant Networks (1901.09342v4)

Published 27 Jan 2019 in cs.LG and stat.ML

Abstract: Constraining linear layers in neural networks to respect symmetry transformations from a group $G$ is a common design principle for invariant networks that has found many applications in machine learning. In this paper, we consider a fundamental question that has received little attention to date: Can these networks approximate any (continuous) invariant function? We tackle the rather general case where $G\leq S_n$ (an arbitrary subgroup of the symmetric group) that acts on $\mathbb{R}^n$ by permuting coordinates. This setting includes several recent popular invariant networks. We present two main results: First, $G$-invariant networks are universal if high-order tensors are allowed. Second, there are groups $G$ for which higher-order tensors are unavoidable for obtaining universality. $G$-invariant networks consisting of only first-order tensors are of special interest due to their practical value. We conclude the paper by proving a necessary condition for the universality of $G$-invariant networks that incorporate only first-order tensors.

Citations (226)

View on Semantic Scholar

Summary

The paper demonstrates that G-invariant networks are universal when leveraging high-order tensors to approximate continuous G-invariant functions.
It reveals that specific groups, such as the alternating group A_n, require high-order tensors to achieve universality.
The study offers practical insights for designing computationally efficient models that adhere to complex symmetry constraints.

Universality of Invariant Networks: A Formal Overview

This paper addresses a foundational question within machine learning regarding the universality of invariant networks: Can $G$ -invariant networks approximate any continuous $G$ -invariant function? The authors aim their exploration at neural networks constrained by symmetry transformations of a group $G$ that is a subgroup of the symmetric group $S_n$ , which acts on $\mathbb{R}^n$ by permuting coordinates.

Summary of Main Results

The authors present two pivotal results:

Universality with High-Order Tensors: The paper demonstrates that $G$ -invariant networks are universal when high-order tensors are allowed. Specifically, by utilizing tensor orders dependent on the group $G$ , they show an arbitrary continuous $G$ -invariant function can indeed be approximated by $G$ -invariant networks.
Necessity of High-Order Tensors for Specific Groups: Conversely, the authors establish that certain groups necessarily require high-order tensors for universality. For example, when considering the alternating group $A_n$ , $G$ -invariant networks necessitate high-order tensors due to the group's inherent symmetries.

The paper also explores conditions under which first-order tensors suffice to achieve universality. They propose a necessary condition related to $k$ -classes and provide examples of permutation groups where first-order tensor networks fail to achieve universality.

Implications and Future Directions

The results hold substantial implications for the design of invariant networks:

Model Design: For practical neural network design, the universality property appeals to a range of domains, such as computer vision and graph neural networks, which require learning functions invariant to transformations.
Computational Efficiency: While high-order tensors guarantee universality, they introduce computational burden. Understanding when these tensors are necessary or can be avoided is crucial for crafting computationally efficient models.
Broader Applicability: The findings expand the applicability of invariant networks, encompassing tasks with complex symmetries beyond the usual image and point cloud data.

In terms of future research, there are significant open questions regarding the classification of 2-closed groups, which could fully clarify the conditions under which first-order invariant networks achieve universality. Additionally, translating these theoretical insights into efficient implementations of higher-order tensor operations warrants further exploration.

Conclusion

This paper fundamentally advances the theoretical understanding of $G$ -invariant networks in machine learning, detailing both the power and limitations of such models dependent on group symmetries. This contribution refines our ability to deploy machine learning models effectively across domains where symmetry constraints govern the structure of learned functions. The dual focus on universality with computational considerations ensures these insights remain practically relevant, aligning theoretical advancements with real-world applicability.

PDF Markdown