NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization (1906.11156v1)

Published 26 Jun 2019 in cs.SI, cs.LG, and stat.ML

Abstract: We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2)the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix---which is dense---is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available (https://github.com/xptree/NetSMF).

Authors (7)

Jiezhong Qiu (29 papers)
Yuxiao Dong (119 papers)
Hao Ma (116 papers)
Jian Li (667 papers)
Chi Wang (93 papers)
Kuansan Wang (18 papers)
Jie Tang (302 papers)

Citations (165)

View on Semantic Scholar

Summary

The paper introduces NetSMF, a novel algorithm that uses spectral sparsification to approximate dense matrix factorization for network embedding.
It significantly reduces computational time and memory usage compared to dense methods like DeepWalk, enabling large-scale analysis.
Empirical results demonstrate high node classification accuracy on networks with tens of millions of nodes, highlighting its practical efficiency.

Analysis of NetSMF: Advancements in Large-Scale Network Embedding via Sparse Matrix Factorization

The paper "NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization" presents a significant contribution to the field of network embedding, particularly in the efficient learning of latent representations for large-scale networks. NetSMF proposes a novel approach by leveraging spectral graph sparsification to achieve efficient sparse matrix factorization. This method addresses the scalability and efficiency challenges faced by previous network embedding methods, specifically those based on dense matrix factorization like NetMF.

Overview

The central contribution of this paper is the development of the NetSMF (Network Sparsification for Matrix Factorization) algorithm. This algorithm introduces a sparse approximation of the dense NetMF matrix that is asymptotically factorized by traditional network embedding approaches like DeepWalk. Specifically, the paper demonstrates that the dense matrix factorization, while effective, is computationally prohibitive for large networks due to its time and space requirements. NetSMF utilizes spectral sparsification techniques to sparsify the matrix involved, thereby enabling the efficient computation of embeddings.

Technical Contributions

Spectral Graph Sparsification: NetSMF employs spectral sparsification algorithms to create a sparse matrix that is spectrally similar to the dense matrix initially required for effective embedding generation. This is achieved while maintaining a bounded approximation error, which is crucial for preserving the quality of the learned embeddings.
Efficiency and Scalability: The algorithm significantly reduces the time complexity associated with matrix construction and factorization. The experimental results presented in the paper indicate that NetSMF can process large-scale networks like the OAG with tens of millions of nodes in about 24 hours on a single server—something that would be infeasible for DeepWalk and other methods.
Numerical Performance: Empirical tests conducted on datasets of varying scales show that NetSMF not only outperforms traditional methods such as DeepWalk and LINE in terms of classification accuracy but also achieves this with far greater efficiency.

Results and Implications

The paper provides compelling empirical evidence demonstrating the benefits of network embedding through sparse matrix factorization. As an alternative to traditional dense matrix approaches, NetSMF is shown to maintain a high level of effectiveness in node classification tasks across various networks.

The theoretical bounds provided for the approximation error due to sparsification assure that this approach retains the essential spectral qualities of the original matrix. This property is critical in ensuring that embeddings do not lose their representational power. Consequently, these embeddings are well-suited for large-scale applications such as social network analysis, recommendation systems, and biological networks.

Future Prospects

The introduction of NetSMF opens several avenues for future research and development:

Extending the NetSMF approach to other types of networks, such as dynamic or directed networks, could offer even broader applicability.
Further optimization of the algorithm to improve computational performance on even larger datasets could be pursued.
Exploration of alternative matrix factorization techniques that can enhance the robustness and applicability of NetSMF in diverse network structures could provide more insights.

In conclusion, the NetSMF algorithm represents a significant advancement in network embedding, combining theoretical rigor with practical efficiency. By addressing the computational limitations of previous methods while achieving robust performance, NetSMF sets a new standard for large-scale network representation learning, with far-reaching implications for the development of scalable machine learning models in network analysis.

PDF Markdown

Related Papers

GitHub

GitHub - xptree/NetSMF: NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization (130 stars)