SimGNN: A Neural Network Approach to Fast Graph Similarity Computation (1808.05689v4)

Published 16 Aug 2018 in cs.LG and stat.ML

Abstract: Graph similarity search is among the most important graph-based applications, e.g. finding the chemical compounds that are most similar to a query compound. Graph similarity computation, such as Graph Edit Distance (GED) and Maximum Common Subgraph (MCS), is the core operation of graph similarity search and many other applications, but very costly to compute in practice. Inspired by the recent success of neural network approaches to several graph applications, such as node or graph classification, we propose a novel neural network based approach to address this classic yet challenging graph problem, aiming to alleviate the computational burden while preserving a good performance. The proposed approach, called SimGNN, combines two strategies. First, we design a learnable embedding function that maps every graph into a vector, which provides a global summary of a graph. A novel attention mechanism is proposed to emphasize the important nodes with respect to a specific similarity metric. Second, we design a pairwise node comparison method to supplement the graph-level embeddings with fine-grained node-level information. Our model achieves better generalization on unseen graphs, and in the worst case runs in quadratic time with respect to the number of nodes in two graphs. Taking GED computation as an example, experimental results on three real graph datasets demonstrate the effectiveness and efficiency of our approach. Specifically, our model achieves smaller error rate and great time reduction compared against a series of baselines, including several approximation algorithms on GED computation, and many existing graph neural network based models. To the best of our knowledge, we are among the first to adopt neural networks to explicitly model the similarity between two graphs, and provide a new direction for future research on graph similarity computation and graph similarity search.

Citations (289)

View on Semantic Scholar

Summary

The paper introduces a dual-strategy method that integrates graph-level attention with pairwise node comparisons for efficient similarity computation.
It outperforms traditional methods by achieving lower mean squared error and higher rank correlations on benchmark datasets.
SimGNN significantly reduces computational time, making it ideal for real-time applications in diverse graph-based domains.

SimGNN: A Neural Network Approach to Fast Graph Similarity Computation

The paper presents an innovative methodology, SimGNN, for computing graph similarity using neural networks. This paper focuses on addressing the traditionally computationally intensive problem of determining graph similarity by leveraging graph neural networks (GNNs). Graph similarity is a critical task in numerous applications, including bioinformatics, chemistry, social networks, and more. The authors propose a dual-strategy framework, integrating both graph-level embeddings and node-level pairwise interactions, to efficiently and effectively compute graph similarity.

Methodological Approach

SimGNN employs two primary strategies:

Graph-Level Embedding Interaction: This strategy involves generating a graph-level embedding by aggregating node-level embeddings with an attention mechanism that highlights the most relevant nodes with respect to a specific similarity metric. The attention mechanism's effectiveness stems from its ability to consider global graph context, allowing it to adaptively learn node significance.
Pairwise Node Comparison: In this strategy, pairwise interaction scores between nodes of two graphs are computed to capture fine-grained similarity details. This augmented information reinforces the graph-level similarity by providing a thorough match on the node level. The extraction of histogram features from these interactions ensures that the permutation invariance property of graphs is maintained.

The integration of these two strategies enables SimGNN to excel in graph similarity computation tasks by maintaining a balance between computational efficiency and the granularity of similarity measurement.

Experimental Evaluation

SimGNN's performance was benchmarked against traditional graph edit distance (GED) approaches and other neural network models across three datasets: AIDS, LINUX, and IMDB. The results demonstrated that SimGNN consistently achieves lower mean squared error (mse) in similarity computation compared to baseline methods, while also maintaining high Spearman's and Kendall's rank correlation coefficients, especially in scenarios involving relatively small graphs.

Notably, the proposed model significantly reduces computational time compared to classical GED algorithms, making it a viable solution for real-time graph similarity applications. The speed-up is attributed to the efficiency of neural network computations and the pre-computation of graph embeddings, which can be further optimized for specific tasks.

Implications and Future Work

The implications of this work are profound for both theoretical advancements in graph neural networks and practical applications requiring fast and accurate graph similarity measures. SimGNN's adaptability to incorporate various graph similarity metrics through its learnable attention mechanism sets a new direction for future research.

Several future research directions are proposed:

Incorporation of Edge Features: Extending the model's capability to handle labeled edges can be particularly beneficial in domains like chemistry where bond characteristics are critical.
Improving Top-k Precision: Despite its efficiency, SimGNN's performance in retrieving top-k similar graphs can be further enhanced by addressing skewed similarity distributions in the training data.
Generalization to Larger Graphs: Exploring SimGNN's scalability and performance when applied to larger graphs, particularly in domains where precise GED computation is infeasible, remains an open challenge.

Conclusion

SimGNN effectively represents a step forward in merging the domain of graph neural networks with graph similarity search. Its ability to deliver accurate similarity measures at a fraction of traditional computation times provides new opportunities for practical deployment in large-scale and complex graph datasets. The dual strategy of integrating graph-level and node-level insights ensures comprehensive similarity modeling, positioning SimGNN as a significant contribution to the field of graph-based machine learning and data mining.

PDF Markdown