DLGNet: Hyperedge Classification through Directed Line Graphs for Chemical Reactions
Published 9 Oct 2024 in cs.LG and cs.AI | (2410.06969v1)
Abstract: Graphs and hypergraphs provide powerful abstractions for modeling interactions among a set of entities of interest and have been attracting a growing interest in the literature thanks to many successful applications in several fields. In particular, they are rapidly expanding in domains such as chemistry and biology, especially in the areas of drug discovery and molecule generation. One of the areas witnessing the fasted growth is the chemical reactions field, where chemical reactions can be naturally encoded as directed hyperedges of a hypergraph. In this paper, we address the chemical reaction classification problem by introducing the notation of a Directed Line Graph (DGL) associated with a given directed hypergraph. On top of it, we build the Directed Line Graph Network (DLGNet), the first spectral-based Graph Neural Network (GNN) expressly designed to operate on a hypergraph via its DLG transformation. The foundation of DLGNet is a novel Hermitian matrix, the Directed Line Graph Laplacian, which compactly encodes the directionality of the interactions taking place within the directed hyperedges of the hypergraph thanks to the DLG representation. The Directed Line Graph Laplacian enjoys many desirable properties, including admitting an eigenvalue decomposition and being positive semidefinite, which make it well-suited for its adoption within a spectral-based GNN. Through extensive experiments on chemical reaction datasets, we show that DGLNet significantly outperforms the existing approaches, achieving on a collection of real-world datasets an average relative-percentage-difference improvement of 33.01%, with a maximum improvement of 37.71%.
The paper introduces DLGNet, a spectral Graph Neural Network designed for hyperedge classification using a novel Directed Line Graph representation.
DLGNet utilizes complex-valued edge weights within its Directed Line Graph to capture and leverage the directional information present in directed hypergraphs.
Experimental results demonstrate that DLGNet significantly outperforms existing methods across chemical reaction datasets, highlighting the value of modeling directionality.
The paper introduces Directed Line Graph Network (DLGNet), a spectral-based Graph Neural Network (GNN) designed for hyperedge classification in directed hypergraphs, with a specific application to chemical reaction classification.
The authors define the concept of a Directed Line Graph (DLG) associated with a directed hypergraph H. In this DLG(H), vertices represent the hyperedges of H, and edges connect vertices if their corresponding hyperedges in H share at least one vertex. Complex-valued edge weights in DLG(H) encode the directionality of interactions within H.
Key contributions include:
A formal definition of a directed line graph associated with a directed hypergraph H, denoted as DLG(H).
The Directed Line Graph Laplacian LN​, a Hermitian matrix capturing both directed and undirected relationships between hyperedges in a directed hypergraph via its DLG. The paper proves that LN​ possesses spectral properties such as being positive semidefinite.
DLGNet, a spectral-based GNN designed to operate on directed line graphs, convolving hyperedge features.
The paper defines an undirected hypergraph as an ordered pair H0, with H1 and H2, where H3 is the set of vertices and H4 is the set of hyperedges. The hyperedges' weights are stored in the diagonal matrix H5, where H6 is the weight of hyperedge H7. The vertex degree H8 and hyperedge degree H9 are defined as H0 for H1, and H2 for H3, stored in diagonal matrices H4 and H5. For 2-uniform hypergraphs, the adjacency matrix H6 is defined such that H7 for each H8 and H9 otherwise. Directed hypergraph H0 is defined as a hypergraph where each hyperedge H1 is partitioned in a head set H2 and a tail set H3.
The relationship between vertices and hyperedges in a undirected hypergraph H4 is classically represented via an incidence matrix H5 of size H6, where
The Laplacian for a general undirected hypergraph is defined as:
H3.
Given a Laplacian matrix H4 of a hypergraph H5 that admits an eigenvalue decomposition H6, where H7 represents the eigenvectors, H8 is its conjugate transpose, and H9 is the diagonal matrix containing the eigenvalues, the convolution H0 between H1 and another graph signal H2 is defined in the frequency space as H3.
The adjacency matrix of H4 is defined as:
H5,
where H6 is the Signless Laplacian of H7. The normalized Signless Laplacian H8 and the normalized Laplacian H9 are defined as:
H0, H1, and H2.
The complex-valued incidence matrix H3 preserves the directionality of H4:
Given H7 as a H8-dimensional graph signal, the feature matrix for the vertices of H9 is defined as LN​0, where LN​1 is the feature matrix of the nodes of LN​2.
The convolution is computed as:
LN​3,
where LN​4 is a complex ReLU activation function, and LN​5 are learnable parameters.
The paper presents experiments conducted on three real-world chemical reaction datasets: {\tt Dataset-1} (50K reactions from USPTO granted patents), {\tt Dataset-2} (5300 reactions from five different sources), and {\tt Dataset-3} (649 competitive reactions extracted from \cite{von2020thousands}). Node features are based on Morgan Fingerprints (MFs).
The results demonstrate that DLGNet outperforms existing methods, achieving an average relative percentage difference improvement of 33.01\% over the second-best method across three real-world datasets. Specifically, DLGNet achieves the best improvement on {\tt Dataset-3}, with an average RPD improvement of approximately 37.71\% and an average additive improvement of 31.65 percentage points.
An ablation study demonstrates the importance of directionality, showing that DLGNet consistently outperforms its undirected counterpart.