Optimal Transport for structured data with application on graphs (1805.09114v3)

Published 23 May 2018 in stat.ML and cs.LG

Abstract: This work considers the problem of computing distances between structured objects such as undirected graphs, seen as probability distributions in a specific metric space. We consider a new transportation distance (i.e. that minimizes a total cost of transporting probability masses) that unveils the geometric nature of the structured objects space. Unlike Wasserstein or Gromov-Wasserstein metrics that focus solely and respectively on features (by considering a metric in the feature space) or structure (by seeing structure as a metric space), our new distance exploits jointly both information, and is consequently called Fused Gromov-Wasserstein (FGW). After discussing its properties and computational aspects, we show results on a graph classification task, where our method outperforms both graph kernels and deep graph convolutional networks. Exploiting further on the metric properties of FGW, interesting geometric objects such as Fr\'echet means or barycenters of graphs are illustrated and discussed in a clustering context.

Citations (247)

View on Semantic Scholar

Summary

The paper introduces the FGW distance, a method that combines node features and graph structure for enhanced similarity measurement.
It develops efficient algorithms using conditional gradient approaches to solve FGW as a quadratic program on real-world datasets.
Experimental results demonstrate that FGW outperforms traditional graph kernels and deep learning models in graph classification tasks.

Optimal Transport for Structured Data with Application on Graphs

The paper "Optimal Transport for Structured Data with Application on Graphs" introduces a new method named Fused Gromov-Wasserstein (FGW) distance to measure the similarity between complex structured data, particularly graphs. Historically, optimal transport (OT) has been leveraged for comparing distributions, but its application to structured data like graphs presents specific challenges. FGW resolves the limitations of existing methods by simultaneously considering both node features and graph topology, unlike the prior Wasserstein and Gromov-Wasserstein distances which separately consider features and structural information.

Key Contributions

Framework for Structured Data: The authors propose a new method capable of handling both feature and structure information of the data. They introduce the concept of viewing graphs as probability measures over a joint space of features and structures. This provides a comprehensive comparison method that accounts for attributes of nodes and connectivity patterns between them.
Fused Gromov-Wasserstein (FGW) Distance: The FGW distance is designed to integrate both features and structure in the optimal transport problem by incorporating a balancing parameter. It asserts a new form of a transportation cost combining feature dissimilarity and structural distortions.
Metric and Semi-Metric Properties: The authors prove that FGW is a metric under certain conditions (when using the Earth's mover distance formulation) and a semi-metric under others, which are important properties allowing its use in machine learning schemes where rigorous distance measurements are necessary.
Algorithms for Computation: The paper provides algorithms for computing the FGW distance efficiently, leveraging conditional gradient methods to solve it as a quadratic program. They apply these algorithms to real-world datasets, showcasing strong performance in classification tasks on graphs.
Barycenter in Graph Clustering: The article extends the concept of a Fréchet mean to graphs, using FGW to compute barycenters in cluster settings. This is notable as it enables the combination of multiple complex data points into meaningful averages, facilitating clustering tasks.

Results and Impact

Empirical results demonstrate that FGW outperforms conventional graph kernels, such as the Weisfeiler-Lehman kernel, and deep learning approaches like graph convolutional networks, across several graph classification benchmarks. FGW not only provides a superior alternative for graph analysis by capturing both topological and feature similarities but also sets a precedent for further extending OT principles to structured data.

Theoretical and Practical Implications

Theoretically, this work expands the scope of optimal transport theory by addressing the integration of heterogeneous information types into a single comparison framework. Practically, FGW provides a robust approach for tasks in fields such as cheminformatics, bioinformatics, and complex network analysis where understanding both node-level data and global network structure is critical.

Future Directions

For future research, several promising directions can be considered:

End-to-End Learning: Integration of FGW within neural network frameworks, potentially leading to new architectures that inherently learn optimal transport distances as part of their model weights.
Scalability Improvements: Addressing the computational complexity of FGW for handling very large graphs, as scalability will be crucial for applications in domains with significant data size constraints.
Enhanced Feature Representation: Further exploration into the choice of feature spaces and structure matrices to optimize FGW performance on particular datasets.

Ultimately, the FGW approach provides a versatile tool for the structured data analysis community, offering both empirical and theoretical advancements in understanding complex multi-modal similarity measures.