DELTACON: A Principled Massive-Graph Similarity Function (1304.4657v1)

Published 17 Apr 2013 in cs.SI and physics.soc-ph

Abstract: How much did a network change since yesterday? How different is the wiring between Bob's brain (a left-handed male) and Alice's brain (a right-handed female)? Graph similarity with known node correspondence, i.e. the detection of changes in the connectivity of graphs, arises in numerous settings. In this work, we formally state the axioms and desired properties of the graph similarity functions, and evaluate when state-of-the-art methods fail to detect crucial connectivity changes in graphs. We propose DeltaCon, a principled, intuitive, and scalable algorithm that assesses the similarity between two graphs on the same nodes (e.g. employees of a company, customers of a mobile carrier). Experiments on various synthetic and real graphs showcase the advantages of our method over existing similarity measures. Finally, we employ DeltaCon to real applications: (a) we classify people to groups of high and low creativity based on their brain connectivity graphs, and (b) do temporal anomaly detection in the who-emails-whom Enron graph.

Citations (236)

View on Semantic Scholar

Summary

The paper introduces DELTACON, a graph similarity algorithm defined by rigorous axioms and properties, which computes node affinity using Fast Belief Propagation and distances with Rooted Euclidean Distance.
Experiments show DELTACON consistently outperforms six state-of-the-art methods on synthetic and real-world datasets, effectively detecting temporal anomalies and enabling meaningful clustering.
DELTACON's principled approach and scalability have significant implications for analyzing large real-world graphs in diverse fields like cybersecurity, social networks, and bioinformatics.

DELTACON: A Principled Massive-Graph Similarity Function

The DELTACON algorithm proposed in this paper addresses the critical task of measuring graph similarity with known node correspondence, a problem that has broad applications in network analysis and anomaly detection. The core challenge is to compare two graphs efficiently while evaluating their similarity in a meaningful and interpretable manner. The authors define axioms and properties that an effective graph similarity measure should fulfill, such as identity, symmetry, zero property, and others like edge importance, weight awareness, edge-submodularity, and focus awareness. These form the basis for introducing DELTACON.

DELTACON Algorithm Overview

The DELTACON algorithm is articulated as a three-step process. First, it computes node affinity scores using a Fast Belief Propagation (FABP) technique that takes into account multi-step neighbor influences with decreasing weights. This method departs from simpler edge overlap comparisons and instead evaluates the connectivity profile across varying distances within the graph, which allows capturing more nuanced structural changes.

Second, the paper proposes the use of the Rooted Euclidean Distance, which better differentiates similarity scores, to measure differences in node affinity scores between the compared graphs.

Lastly, the computed distances are translated into similarity scores interpreted within the [0,1] interval, maintaining intuitive and meaningful results even as graph scales increase to billions of nodes and edges.

Experimental Results and Applications

Experiments executed on synthetic and real-world datasets, such as brain connectivity graphs and the Enron email network, demonstrate that DELTACON consistently satisfies the outlined properties better than six current state-of-the-art methods, including VEO, GED, and eigenvalue-based A-distance. Notably, DELTACON excelled in capturing significant changes like those seen in temporal anomalies within the Enron data, correlating low similarity scores with known historical events.

In the brain connectivity paper, DELTACON facilitated clustering that distinguished between individuals based on creativity, thus not only proving its scalability and robustness but also its potential active role in neuroscience and cognitive studies.

Implications and Future Directions

The theoretical contributions of this paper lay in formalizing the constraints and criteria for a graph similarity function while practically contributing an algorithm that is both principled and scalable. The ability of DELTACON to analyze real-world graphs efficiently and identify significant changes has implications for numerous fields, ranging from cybersecurity to social network analysis and bioinformatics.

Further developments could include enhancing the parallelization of the algorithm for even larger datasets and exploring strategic graph partitioning techniques to improve computation further. This would contribute to DELTACON's applicability and performance in vast dynamic networks.

Conclusion

DELTACON stands as a comprehensive approach to graph similarity assessment, satisfying rigorous theoretical demands while providing pragmatic solutions across diverse application domains. Its adoption and success in detecting meaningful differences among large-scale graphs highlight its vital role in advancing graph-based analytics in complex networked environments.

PDF Markdown