Network Representation Learning: A Survey (1801.05852v3)

Published 4 Dec 2017 in cs.SI, cs.LG, and stat.ML

Abstract: With the widespread use of information technologies, information networks are becoming increasingly popular to capture complex relationships across various disciplines, such as social networks, citation networks, telecommunication networks, and biological networks. Analyzing these networks sheds light on different aspects of social life such as the structure of societies, information diffusion, and communication patterns. In reality, however, the large scale of information networks often makes network analytic tasks computationally expensive or intractable. Network representation learning has been recently proposed as a new learning paradigm to embed network vertices into a low-dimensional vector space, by preserving network topology structure, vertex content, and other side information. This facilitates the original network to be easily handled in the new vector space for further analysis. In this survey, we perform a comprehensive review of the current literature on network representation learning in the data mining and machine learning field. We propose new taxonomies to categorize and summarize the state-of-the-art network representation learning techniques according to the underlying learning mechanisms, the network information intended to preserve, as well as the algorithmic designs and methodologies. We summarize evaluation protocols used for validating network representation learning including published benchmark datasets, evaluation methods, and open source algorithms. We also perform empirical studies to compare the performance of representative algorithms on common datasets, and analyze their computational complexity. Finally, we suggest promising research directions to facilitate future study.

PDF Abstract

Network Representation Learning: A Survey

This paper provides a comprehensive survey of network representation learning (NRL), a burgeoning area of research in the fields of data mining and machine learning. As information networks such as social, citation, and biological networks continue to proliferate, there is a growing need for efficient techniques to analyze their complex structures. Traditional network analysis approaches encounter significant computational challenges due to the sheer scale of modern networks. NRL emerges as a solution, embedding network vertices into a lower-dimensional vector space while preserving essential topological and contextual information.

Overview of Network Representation Learning

The authors categorize NRL methodologies into two distinct approaches: unsupervised and semi-supervised learning.

Unsupervised NRL: This paradigm operates without labeled data, embedding network vertices based on structural similarities alone. Techniques within this category focus on preserving various aspects of network structure, including microscopic properties like local proximity, mesoscopic attributes such as community structures, and macroscopic features like global connectivity patterns.
Semi-supervised NRL: Here, a limited set of labeled vertices guide the learning process. This approach integrates network topology with vertex labels, often resulting in embeddings that are both informative and discriminative.

Methodological Approaches

The paper identifies five core methodologies within NRL techniques:

Matrix Factorization: These methods leverage dimensionality reduction by factorizing matrices capturing vertex relationships. They effectively capture global structures but face scalability issues due to high computational demands.
Random Walks: This class of methods simulates vertex sequences to model network vicinity, allowing scalable and efficient representation learning, primarily capturing local structures.
Edge Modeling: Direct modeling of vertex connections offers efficient representation, though it generally focuses on local connectivity.
Deep Learning: These methods extract complex structural features by modeling non-linear interactions across network vertices, often at the expense of increased computational complexity.
Hybrid Techniques: Combining multiple methodologies, these approaches aim to capture a broader spectrum of network properties.

Empirical Evaluation and Challenges

The survey illustrates that most unsupervised techniques excel in vertex classification and clustering when judged by empirical studies on benchmark datasets. Meanwhile, semi-supervised strategies demonstrate strong discriminative capabilities, particularly in sparsely labeled scenarios.

Despite its successes, NRL faces several challenges:

Task-dependence: Most current NRL algorithms are task-agnostic, and future research could benefit from task-specific designs.
Dynamic Networks: Networks that evolve over time present additional challenges, requiring efficient updates to embeddings.
Scalability: While some methods offer linear scalability, many existing algorithms struggle with very large networks, necessitating further advancements.
Heterogeneity: Handling diverse types of data and relationships in heterogeneous networks poses another layer of complexity.
Robustness: Ensuring robust embeddings in the face of network noise and uncertainty remains an ongoing concern.

Conclusion and Implications

This survey provides a foundational overview of NRL, connecting theoretical underpinnings with practical applications. The findings underscore the potential of NRL to transform network analysis across numerous domains, offering paths forward in AI research and application. Future work could explore more efficient, robust, and task-specific algorithms, focusing on scaling up methodologies to accommodate the growing complexity and size of modern information networks.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Daokun Zhang (13 papers)
Jie Yin (47 papers)
Xingquan Zhu (36 papers)
Chengqi Zhang (74 papers)

Citations (585)

View on Semantic Scholar

Network Representation Learning: A Survey (1801.05852v3)