Wasserstein Weisfeiler-Lehman Graph Kernels (1906.01277v2)

Published 4 Jun 2019 in cs.LG, q-bio.MN, and stat.ML

Abstract: Most graph kernels are an instance of the class of $\mathcal{R}$-Convolution kernels, which measure the similarity of objects by comparing their substructures. Despite their empirical success, most graph kernels use a naive aggregation of the final set of substructures, usually a sum or average, thereby potentially discarding valuable information about the distribution of individual components. Furthermore, only a limited instance of these approaches can be extended to continuously attributed graphs. We propose a novel method that relies on the Wasserstein distance between the node feature vector distributions of two graphs, which allows to find subtler differences in data sets by considering graphs as high-dimensional objects, rather than simple means. We further propose a Weisfeiler-Lehman inspired embedding scheme for graphs with continuous node attributes and weighted edges, enhance it with the computed Wasserstein distance, and thus improve the state-of-the-art prediction performance on several graph classification tasks.

Citations (175)

View on Semantic Scholar

Summary

The paper introduces a graph kernel that leverages Wasserstein distance to capture subtle substructure differences in graphs.
It extends the Weisfeiler-Lehman framework to handle continuous node attributes and weighted edges for enhanced classification.
Empirical validations show state-of-the-art performance on datasets, outperforming traditional graph kernel methods.

An Expert Analysis of "Wasserstein Weisfeiler-Lehman Graph Kernels"

The paper "Wasserstein Weisfeiler-Lehman Graph Kernels" introduces a novel method for graph classification that addresses the nuanced differences in graph structures through the employment of the Wasserstein distance. This approach advances the graph kernel domain by incorporating continuous node attributes and weighted edges into the Weisfeiler-Lehman (WL) framework, thus providing a more comprehensive solution compared to traditional graph kernels like the WL subtree kernel. The authors present a thorough analysis including the theoretical underpinnings, formulation, and extensive empirical validation of their proposed method.

Key Contributions

Graph Wasserstein Distance: At the core of this research is the proposal of a graph Wasserstein distance (GWD), which calculates the Wasserstein distance for graph comparison by treating nodes as distributions in a high-dimensional space. This approach aids in capturing subtle substructure differences which are typically lost with naïve aggregation strategies employed by traditional graph kernels.
Extension to Continuously Attributed Graphs: The method extends the Weisfeiler-Lehman embedding concept to handle continuous attributes, which allows for the use of continuous optimization techniques to discern graph similarities. This is achieved through a WL-inspired iterative procedure which refines node features, making this kernel applicable to graphs with continuous node attributes and weighted edges.
Theoretical and Empirical Validation: The paper not only lays down the theoretical foundation of the proposed graph kernels but also delivers empirical assessments demonstrating state-of-the-art performance, particularly on datasets with continuous attributes. They compare the Wasserstein Weisfeiler-Lehman (WWL) kernel against several existing methods, exhibiting competitive performance against traditional WL kernels and superior handling of continuous attributes over other methods.

Numerical Results and Analysis

The authors provide a detailed comparative analysis across multiple datasets, showing that the WWL kernel frequently sets new benchmarks in accuracy for tasks involving continuously attributed graphs. Noteworthy results include:

WWL outperforming all methods in datasets that have node attributes modeled as real-valued vectors or weighted edges.
The competitive standing of the categorical WWL kernel against the Weisfeiler-Lehman optimal assignment (WL-OA) kernel on traditional labeled graphs.

These results underscore the WWL kernel's robustness and effectiveness in capturing graph similarity for both categorical and continuous datasets, supporting its general versatility and utility in various graph classification tasks.

Implications and Future Directions

The integration of optimal transport theory with graph kernels marks an innovative step poised to impact both theoretical and practical applications in machine learning on graphs. The ability of the WWL kernel to account for continuous data may have far-reaching implications for applications in cheminformatics, bioinformatics, and social network analysis where graphs naturally include weighted edges and node attributes.

Furthermore, future research could explore runtime optimizations leveraging Sinkhorn regularization and simplify computational demand, thus broadening the application scope to more extensive graph domains. Additionally, reinforcing the positive definiteness of the kernel in all settings and extending the described methods to handle high-dimensional edge attributes present plausible directions.

By bridging the traditional WL framework with optimal transport theory, this work enriches the landscape of structured data analysis methods, encouraging further exploration and development in sample-efficient, nuanced interpretation of complex graphical data.

PDF Markdown

Related Papers

YouTube

Show All Videos