Are High-Degree Representations Really Unnecessary in Equivariant Graph Neural Networks? (2410.11443v3)

Published 15 Oct 2024 in cs.LG

Abstract: Equivariant Graph Neural Networks (GNNs) that incorporate E(3) symmetry have achieved significant success in various scientific applications. As one of the most successful models, EGNN leverages a simple scalarization technique to perform equivariant message passing over only Cartesian vectors (i.e., 1st-degree steerable vectors), enjoying greater efficiency and efficacy compared to equivariant GNNs using higher-degree steerable vectors. This success suggests that higher-degree representations might be unnecessary. In this paper, we disprove this hypothesis by exploring the expressivity of equivariant GNNs on symmetric structures, including $k$-fold rotations and regular polyhedra. We theoretically demonstrate that equivariant GNNs will always degenerate to a zero function if the degree of the output representations is fixed to 1 or other specific values. Based on this theoretical insight, we propose HEGNN, a high-degree version of EGNN to increase the expressivity by incorporating high-degree steerable vectors while maintaining EGNN's efficiency through the scalarization trick. Our extensive experiments demonstrate that HEGNN not only aligns with our theoretical analyses on toy datasets consisting of symmetric structures, but also shows substantial improvements on more complicated datasets such as $N$-body and MD17. Our theoretical findings and empirical results potentially open up new possibilities for the research of equivariant GNNs.

Citations (1)

View on Semantic Scholar

Summary

The paper reveals that 1st-degree equivariant GNNs degenerate on symmetric graphs, limiting their ability to distinguish orientations.
It introduces HEGNN, a model that initializes high-degree steerable features with spherical harmonics and computes cross-degree invariant messages.
Experimental results on synthetic symmetric structures and real-world physical simulations confirm HEGNN’s improved expressivity and efficiency over baseline models.

This paper, "Are High-Degree Representations Really Unnecessary in Equivariant Graph Neural Networks?" (2410.11443), investigates the expressivity of Equivariant Graph Neural Networks (GNNs) that process 3D geometric graph data while respecting E(3) or SE(3) symmetry. A prevailing hypothesis, inspired by the success of models like EGNN (2410.11443) which primarily use 1st-degree steerable vectors (Cartesian vectors), suggested that higher-degree representations might be unnecessary. The authors challenge this hypothesis.

The core theoretical finding is that standard equivariant GNNs, particularly those whose output steerable representations are fixed to degree 1 (like the position output in many EGNN variants), will inevitably degenerate to a zero function when applied to certain symmetric graph structures. This means they lose the ability to distinguish orientations for graphs with symmetries like k-fold rotations or regular polyhedra. The paper provides theoretical proof based on group representation theory, demonstrating that for a symmetric graph with symmetry group $\mathfrak{H}$ , an equivariant function $f^{(l)}$ with output degree $l$ satisfies $f^{(l)}(\gG) = \rho^{(l)}(\mathfrak{H})f^{(l)}(\gG)$, where $\rho^{(l)}(\mathfrak{H})$ is the group average of the representation matrix for degree $l$ . If the matrix $\mI - \rho^{(l)}(\mathfrak{H})$ is non-singular, the only solution is $f^{(l)}(\gG) = \bm{0}$. The paper shows that for $l=1$ , this degeneration occurs for several common symmetric groups (e.g., those associated with k-folds and regular polyhedra), indicating a fundamental limitation for 1st-degree-only models on such inputs. The paper includes a table (Table 1) summarizing the specific degrees $l$ for which the function degenerates to zero for different symmetric graphs.

Based on this theoretical insight, the authors propose High-Degree Equivariant Graph Neural Network (HEGNN), an extension of EGNN that incorporates higher-degree steerable vectors. HEGNN aims to overcome the expressivity limitations while retaining EGNN's efficiency advantages, particularly its use of a scalarization trick instead of the more computationally expensive Clebsch-Gordan tensor products used in traditional high-degree models like Tensor Field Networks (TFN) (2410.11443).

The HEGNN architecture consists of three main components:

Initialization of high-degree steerable features: Beyond the initial node features (type-0 scalars) and coordinates (type-1 vectors), HEGNN initializes steerable features $\{\tilde{\vv}_i^{(l)}\}_{l=0}^L$ for degrees up to $L$ using spherical harmonics $Y^{(l)}$ on normalized relative coordinates between neighboring nodes. An invariant scalar message modulated by an MLP is used as a weighting factor in this aggregation.
Calculation of cross-degree invariant messages: This is where HEGNN generalizes EGNN's scalarization. It computes invariant scalars $z_{ij}^{(l)}$ by taking the inner product $\langle\tilde\vv_i^{(l)} ,\tilde\vv_j^{(l)}\rangle$ for each degree $l$ . These degree-specific invariant scalars, along with other invariant quantities like squared distances and node/edge features, are concatenated and passed through an MLP ($\varphi_{\vm}$) to produce an overall invariant message $\vm_{ij}$ between nodes $i$ and $j$ .
Aggregation and update: The invariant message $\vm_{ij}$ is used to calculate updates (residues) for node scalar features ($\Delta\vh_i$), coordinates ($\Delta\vec\vx_i$), and steerable features of each degree ($\Delta\tilde\vv_i^{(l)}$). The coordinate update $\Delta\vec\vx_i$ is computed by summing scaled relative coordinates $\varphi_{\vec\vx}(\vm_{ij})\cdot(\vec\vx_i-\vec\vx_j)$. The steerable feature update $\Delta\tilde\vv_i^{(l)}$ for degree $l$ is computed by summing scaled differences $\varphi_{\tilde\vv}^{(l)}(\vm_{ij})\cdot\left(\tilde\vv_i^{(l)}-\tilde\vv_j^{(l)}\right)$. Note that $\varphi_{\vec\vx}$ and $\varphi_{\tilde\vv}^{(l)}$ output scalars. These updates are then added to the current node features, coordinates, and steerable features. The authors note that this update structure for steerable features across degrees can be viewed as a Clebsch-Gordan tensor product with scalar weights, which can be implemented using libraries like e3nn.o3.FullyConnectedTensorProduct.

The paper theoretically shows that HEGNN, by incorporating higher-degree features, can recover the information of all angles between pairs of edges through the inner products $z_{ij}^{(l)}$ across sufficient degrees $l$ , thus avoiding the expressivity issues faced by 1st-degree-only models on symmetric graphs.

Experimental results are presented on both toy datasets of symmetric structures and real-world physical simulation datasets (N-body and MD17).

Symmetric Graphs: Experiments on k-fold structures and regular polyhedra confirm the theoretical predictions. Models using only 1st-degree representations (EGNN, GVP-GNN, HEGNN $_{l=1}$ ) fail to distinguish rotated copies of these symmetric graphs. HEGNN $_{l=L}$ fails for specific degrees $L$ identified by the theory. Models using a sufficient set of degrees (HEGNN $_{l\leq L}$ , TFN, MACE for appropriate $L$ ) successfully distinguish the graphs.
N-body: HEGNN consistently outperforms baseline models, including EGNN, TFN, and SE(3)-Transformer, across different dataset sizes (N=5, 20, 50, 100) for predicting particle positions. HEGNN $_{l\leq2}$ and HEGNN $_{l\leq3}$ show the best performance depending on the number of particles. The inference time analysis shows that HEGNN is significantly faster than traditional high-degree models like TFN and MACE, although it is slower than the base EGNN due to handling more features.
MD17: HEGNN also shows improved performance on molecular dynamics trajectory prediction compared to many baselines, including EGNN, TFN, and SE(3)-Transformer, achieving the best results on six out of eight molecules. This suggests the benefits of high-degree features extend beyond strictly symmetric cases to more general geometric graphs.
Perturbation Experiment: A simple experiment on perturbed tetrahedra shows that EGNN's performance is still limited even with small noise breaking perfect symmetry, while HEGNN maintains better robustness.

In summary, the paper provides theoretical evidence for the necessity of high-degree steerable representations in equivariant GNNs, particularly on symmetric structures where 1st-degree models fail. It proposes HEGNN, an efficient architecture that incorporates these higher degrees using a generalized scalarization approach. Experiments validate the theory and demonstrate HEGNN's improved expressivity and performance on both synthetic symmetric data and real-world physical simulation tasks, showcasing a practical way to leverage high-degree representations effectively.

For practical implementation, one would need to:

Choose the maximum degree $L$ for steerable features, potentially informed by the expected symmetries in the data or through hyperparameter tuning. The experiments suggest $L \leq 6$ is often sufficient in practice.
Implement the initialization step using spherical harmonics, potentially leveraging libraries like SciPy or e3nn.
Implement the message passing using inner products between steerable features of the same degree to create invariant messages, which then scale features of different degrees. This can be structured using tensor product operations available in libraries like e3nn, where the 'weights' of the tensor product are the scalar outputs of the MLP processing the invariant message.
Configure the MLPs ($\varphi_{\vm}$, $\varphi_{\vec\vx}$, $\varphi_{\tilde\vv}^{(l)}$) for processing invariant scalars and generating scalar scaling factors.
Manage the potentially increased memory and computational cost associated with higher-degree features, although HEGNN's scalarization aims to mitigate this compared to full Clebsch-Gordan products. The parameter and inference time analysis in the appendix provides estimates for different degrees.

Limitations noted by the authors include not verifying the model on large-scale molecules or physical systems. However, the results on MD17 and N-body (up to 100 particles) are promising for medium-scale applications. The work contributes to the field of AI for Science by providing a more expressive and practical geometric deep learning architecture for modeling physical systems.

PDF Markdown

Are High-Degree Representations Really Unnecessary in Equivariant Graph Neural Networks? (2410.11443v3)

Summary

Related Papers