A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications (1709.07604v3)

Published 22 Sep 2017 in cs.AI

Abstract: Graph is an important data representation which appears in a wide diversity of real-world scenarios. Effective graph analytics provides users a deeper understanding of what is behind the data, and thus can benefit a lot of useful applications such as node classification, node recommendation, link prediction, etc. However, most graph analytics methods suffer the high computation and space cost. Graph embedding is an effective yet efficient way to solve the graph analytics problem. It converts the graph data into a low dimensional space in which the graph structural information and graph properties are maximally preserved. In this survey, we conduct a comprehensive review of the literature in graph embedding. We first introduce the formal definition of graph embedding as well as the related concepts. After that, we propose two taxonomies of graph embedding which correspond to what challenges exist in different graph embedding problem settings and how the existing work address these challenges in their solutions. Finally, we summarize the applications that graph embedding enables and suggest four promising future research directions in terms of computation efficiency, problem settings, techniques and application scenarios.

Authors (3)

Hongyun Cai (8 papers)
Vincent W. Zheng (14 papers)
Kevin Chen-Chuan Chang (53 papers)

Citations (1,707)

View on Semantic Scholar

Summary

The paper provides a comprehensive review of graph embedding by categorizing input variations, output representations, and key methodological techniques.
It examines multiple methods—including matrix factorization, deep learning, and generative models—highlighting their trade-offs in capturing global and local graph structures.
The analysis illustrates practical applications in node classification, link prediction, and whole-graph analysis while suggesting future research directions for scalability and dynamic graphs.

A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications

Graph embedding offers a potent solution to the computational challenges inherent in traditional graph analytics by converting graph data into a low-dimensional space. In this survey by Hongyun Cai, Vincent W. Zheng, and Kevin Chen-Chuan Chang, the authors examine the field's current landscape, focusing on three primary facets: problem settings, techniques, and applications.

Problem Settings: Input and Output Diversity

Graph Embedding Input

The first categorization presented by the authors deals with the input variations in graph embedding. The input may include:

Homogeneous Graphs: These consist of untyped nodes and edges, featuring either undirected or directed and weighted connections. The key challenge here is capturing diverse connectivity patterns within the graph's structure.
Heterogeneous Graphs: These encompass multiple node and edge types, prevalent in community-based Question Answering (cQA) sites, multimedia networks, and knowledge graphs. Handling the global consistency among different entity types and addressing data imbalances are vital challenges in this context.
Graphs with Auxiliary Information: These graphs include additional node or edge attributes (e.g., labels, features, or knowledge base content). Incorporating such rich and unstructured information into the embedding process, while maintaining structural integrity, defines the primary challenge.
Constructed Graphs from Non-relational Data: Here, graphs are constructed using non-relational data inputs, where nodes and edges are generated based on various similarity measures or co-occurrence patterns. This involves ensuring that constructed relations preserve the underlying data's proximity measures effectively.

Graph Embedding Output

The output in graph embedding scenarios is another primary axis of categorization, including:

Node Embedding: Typically, each node is represented as a vector, preserving the node's similarity to its neighbors.
Edge Embedding: In contrast to node embedding, edge embedding focuses on capturing relationships between node pairs. Knowledge graph applications often utilize this type, with the embedding process ensuring the asymmetric properties of directed edges.
Hybrid Embedding: This encompasses embedding substructures, such as node pairs or communities, and demands resolving the heterogeneity of embedded components.
Whole-Graph Embedding: Representing an entire graph as a singular vector to facilitate similarity searches and classification tasks.

Embedding Techniques

The authors categorize the techniques into five principal methods:

Matrix Factorization: This method involves factorizing a node proximity matrix, transformed through various means (e.g., Laplacian eigenmaps), to reduce dimensionality. While effective in capturing global proximities, it often suffers from high computational costs.
Deep Learning: Various deep learning models, with or without random walks, are employed to capture graph characteristics. Despite their robust performance and ability to automate feature identification, deep learning approaches can become computationally intensive.
Edge Reconstruction Based Optimization: This approach revolves around directly optimizing edge reconstruction probabilities or minimizing distance-based or margin-based ranking losses, offering efficiency advantages but often focusing primarily on local rather than global graph structures.
Graph Kernel: A graph is decomposed into atomic substructures like graphlets, subtree patterns, or random walks, and is subsequently represented as a vector of substructure counts. While efficient, this method may suffer from sparsity due to rapid growth in dimensions.
Generative Models: Embedding is viewed through the lens of semantic spaces, allowing the integration of node features through latent variable models. Challenges include choosing appropriate distributions and requiring substantial training data.

Applications

Graph embedding facilitates numerous applications:

Node-Related: Embeddings are useful for node classification (e.g., using SVM or logistic regression), clustering, recommendation, retrieval, and ranking.
Edge-Related: Common applications include link prediction and triplet classification within knowledge graphs.
Whole-Graph: Tasks such as graph classification leverage entire graph embeddings for better performance and efficiency.

Future Directions

The survey suggests several future directions:

Computation Efficiency: Addressing the computational inefficiency inherent to deep learning models, particularly for large-scale graphs.
Dynamic Graph Embedding: Developing methods for graphs whose structure or informational content evolves over time.
Techniques: Incorporating more comprehensive structural awareness and finding more scalable optimization methods.
Expansion of Applications: Exploring a wider array of scenarios where graph embedding can provide novel insights and efficiency improvements.

Conclusion

This comprehensive survey synthesizes over a decade of research and development in graph embedding. By categorizing existing methods and identifying challenges and directions for future research, it provides a valuable reference point for academics and practitioners working within this domain.

PDF Markdown