A Survey on Network Embedding (1711.08752v1)

Published 23 Nov 2017 in cs.SI

Abstract: Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the network structure. Recently, a significant amount of progresses have been made toward this emerging network analysis paradigm. In this survey, we focus on categorizing and then reviewing the current development on network embedding methods, and point out its future research directions. We first summarize the motivation of network embedding. We discuss the classical graph embedding algorithms and their relationship with network embedding. Afterwards and primarily, we provide a comprehensive overview of a large number of network embedding methods in a systematic manner, covering the structure- and property-preserving network embedding methods, the network embedding methods with side information and the advanced information preserving network embedding methods. Moreover, several evaluation approaches for network embedding and some useful online resources, including the network data sets and softwares, are reviewed, too. Finally, we discuss the framework of exploiting these network embedding methods to build an effective system and point out some potential future directions.

Citations (1,056)

View on Semantic Scholar

Summary

The paper surveys techniques that transform network graphs into low-dimensional embeddings while preserving key proximities and community structures.
It categorizes methods into structure-preserving, side-information-enhanced, and task-specific approaches using supervised or semi-supervised signals.
The paper evaluates these techniques using benchmarks like node classification and link prediction, and outlines future directions for dynamic and complex networks.

A Survey on Network Embedding

The paper provides a comprehensive survey of the domain of network embedding, which aims to encode network structures into low-dimensional vector spaces while preserving essential network properties. Given the diversity and complexity of networks across different domains, such as social, biological, and information networks, the paper addresses the critical challenge of representing network data effectively for advanced analytic tasks, such as node classification, clustering, and link prediction.

Introduction and Motivation

The traditional representation of networks as graphs, consisting of nodes and edges, presents challenges in terms of computational complexity, parallelizability, and the applicability of machine learning methods. Network embedding methodologies have emerged to address these issues by embedding nodes into low-dimensional vectors, thereby transforming the relationships between nodes from explicit link representations to distances in a vector space.

Categorization of Network Embedding Methods

The paper categorizes network embedding methods into three primary categories based on the types of information preserved:

Structure and Property Preserving Network Embedding: This category focuses on retaining the essential structural elements of the network, such as first-order and second-order proximities, and community structures. Representative methods include DeepWalk, LINE, Node2vec, GraRep, SDNE, and M-NMF.
Network Embedding with Side Information: Here, additional information such as node attributes, labels, and content is incorporated into the embedding process. Key methods such as TADW, MMDW, TriDNR, and LANE are discussed for their ability to leverage rich side information.
Advanced Information Preserving Network Embedding: This category involves embedding techniques that consider supervised or semi-supervised information particular to specific tasks. Examples include applications in information diffusion, anomaly detection, and network alignment.

Evaluation Methods

The paper also reviews several evaluation approaches to assess the effectiveness of network embeddings, including commonly used datasets like BLOGCATALOG, FLICKR, YOUTUBE, DBLP, Cora, Citeseer, ArXiv, and Wikipedia. It identifies tasks such as node classification, link prediction, node clustering, and network visualization as standard benchmarks to measure embedding performance.

Practical Implications and Future Directions

The practical implications of network embedding are significant. For instance, tasks in social media analysis, biological network analysis, and recommendation systems can benefit from the ability of network embeddings to capture underlying structural properties of nodes. Future developments in the field are oriented towards several promising directions:

More Complex Structures and Properties: Exploring higher-order structures like motifs, hypernetworks, and centrality measures to enhance embedding fidelity.
Dynamic Network Embedding: Developing methods that can adapt to the evolving nature of networks over time efficiently.
Task-Specific Embeddings: Creating embeddings tailored for specific domains and applications to maximize their utility.
Alternative Embedding Spaces: Investigating other target embedding spaces, such as hyperbolic space, to capture more intricate network properties.

Conclusion

In conclusion, the paper serves as a detailed roadmap for researchers and practitioners interested in network embedding techniques. By systematically categorizing existing methods and highlighting their applications and limitations, it paves the way for future research endeavors aimed at tackling more complex network representation challenges. The ongoing efforts in this field hold promise for advancing various applications across computational and data-driven disciplines.

PDF Markdown