Graph Representation Learning: A Survey (1909.00958v1)

Published 3 Sep 2019 in cs.LG, cs.SI, and stat.ML

Abstract: Research on graph representation learning has received a lot of attention in recent years since many data in real-world applications come in form of graphs. High-dimensional graph data are often in irregular form, which makes them more difficult to analyze than image/video/audio data defined on regular lattices. Various graph embedding techniques have been developed to convert the raw graph data into a low-dimensional vector representation while preserving the intrinsic graph properties. In this review, we first explain the graph embedding task and its challenges. Next, we review a wide range of graph embedding techniques with insights. Then, we evaluate several state-of-the-art methods against small and large datasets and compare their performance. Finally, potential applications and future directions are presented.

Authors (4)

Fenxiao Chen (5 papers)
Yuncheng Wang (4 papers)
Bin Wang (750 papers)
C. -C. Jay Kuo (176 papers)

Citations (184)

View on Semantic Scholar

Summary

Insights into Graph Representation Learning: A Survey

The paper "Graph Representation Learning: A Survey," authored by Fenxiao Chen and colleagues, addresses the increasing significance of graph representation learning. The research acknowledges the prevalence of graph-structured data in diverse real-world applications, such as social networks, biological networks, and linguistic networks. Unlike structured data such as images or audio, graph data is high-dimensional and irregular, posing unique analysis challenges. This survey provides a thorough exploration of the techniques employed to transmute high-dimensional graph data into lower-dimensional vector representations while retaining essential graph properties.

Overview of Graph Embedding Challenges and Techniques

Graph representation learning, or graph embedding, aims to capture a graph's structural essence in a condensed form that is computationally feasible for modern machine learning algorithms. The paper identifies three primary challenges in graph embedding: selecting an optimal embedding dimension, choosing which graph properties to preserve, and the lack of guidance on selecting suitable embedding methods for specific tasks.

Dimensionality and Property Preservation: The trade-off between high-dimensional embeddings that preserve graph information and low-dimensional ones that favor storage efficiency and reduced noise is emphasized. This trade-off is context-sensitive, dependent on the graph and application domain.
Methodological Diversity: Numerous techniques have emerged, addressing these challenges through various approaches:
- Traditional Methods: Include dimensionality reduction techniques that preserve essential graph features.
- Emerging Neural Methods: Feature deep neural networks, such as Convolutional Neural Networks (CNNs) and Graph Convolutional Networks (GCNs), adapted to graph data structures.
- Scalability Solutions: Explore methods like random walks, matrix factorization, and neural networks tailored to handle large-scale graphs, enhancing computational and memory efficiency.

Performance Evaluation and Applications

The paper undertakes an empirical assessment of state-of-the-art graph embedding techniques on diverse datasets, both small (e.g., Cora, Citeseer) and large (e.g., YouTube, Flickr), emphasizing vertex classification and clustering tasks. The results promote random walk-based methods, such as DeepWalk and node2vec, for their balance between performance and computational resource demands. These techniques excel in preserving higher-order proximities and context from graph topology and node attributes.

Future Directions

The paper outlines several promising avenues for future work in graph representation learning:

Deep Graph Embedding: Extending deeper architectures without succumbing to over-smoothing problems encountered in GCNs.
Dynamic and Semi-supervised Models: Adjusting to evolving graph structures in real-time applications and exploiting partially labeled data, respectively.
Interpretable AI: Striving for understandability in embeddings to bridge the gap between performance and transparency, making AI more accountable and reliable.

In conclusion, this paper serves as a comprehensive guide and reference point on graph representation learning methodologies. It provides a strong foundation for researchers aiming to tackle complex graph-structured data challenges in various application areas. The inclusion of an open-source Python library, GRLL, further positions this survey as a practical resource for developing and testing graph embedding algorithms.

PDF Markdown