Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Representation Learning for Attributed Multiplex Heterogeneous Network (1905.01669v2)

Published 5 May 2019 in cs.SI and cs.LG

Abstract: Network embedding (or graph embedding) has been widely used in many real-world applications. However, existing methods mainly focus on networks with single-typed nodes/edges and cannot scale well to handle large networks. Many real-world networks consist of billions of nodes and edges of multiple types, and each node is associated with different attributes. In this paper, we formalize the problem of embedding learning for the Attributed Multiplex Heterogeneous Network and propose a unified framework to address this problem. The framework supports both transductive and inductive learning. We also give the theoretical analysis of the proposed framework, showing its connection with previous works and proving its better expressiveness. We conduct systematical evaluations for the proposed framework on four different genres of challenging datasets: Amazon, YouTube, Twitter, and Alibaba. Experimental results demonstrate that with the learned embeddings from the proposed framework, we can achieve statistically significant improvements (e.g., 5.99-28.23% lift by F1 scores; p<<0.01, t-test) over previous state-of-the-art methods for link prediction. The framework has also been successfully deployed on the recommendation system of a worldwide leading e-commerce company, Alibaba Group. Results of the offline A/B tests on product recommendation further confirm the effectiveness and efficiency of the framework in practice.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yukuo Cen (19 papers)
  2. Xu Zou (27 papers)
  3. Jianwei Zhang (114 papers)
  4. Hongxia Yang (130 papers)
  5. Jingren Zhou (198 papers)
  6. Jie Tang (302 papers)
Citations (412)

Summary

  • The paper introduces a unified framework that jointly learns node embeddings by integrating attributes and multiple edge types.
  • It applies a self-attention mechanism to capture varying influences of distinct edge types, leading to improved link prediction performance.
  • The framework supports both transductive and inductive paradigms, enabling scalable representation learning for dynamic, large-scale networks.

Overview of "Representation Learning for Attributed Multiplex Heterogeneous Network"

The paper "Representation Learning for Attributed Multiplex Heterogeneous Network" addresses the challenges posed by the intricate structures of real-world networks. These networks are characterized by multi-typed nodes and edges, and the presence of multiple attributes associated with each node. The authors introduce a unified framework for embedding learning in Attributed Multiplex Heterogeneous Networks (AMHENs), aiming to provide a scalable solution for representation learning that encapsulates the complexities inherent in such networks.

Problem Formulation and Framework

The paper formalizes the concept of AMHENs, a class of networks where different types of nodes may be connected via multiple types of edges, and each node is associated with various attributes. The objective is to project nodes into a low-dimensional space that preserves the network's structural and attribute information comprehensively. The proposed framework incorporates both transductive and inductive learning paradigms, enhancing its applicability to partially observed and dynamic networks.

The framework extends beyond previous approaches by integrating multiplex edge types and applying attention mechanisms to better capture interactional nuances across different edge types. This is particularly relevant when considering the scalability to networks encompassing billions of nodes and edge types, a common scenario in large-scale systems such as those found in e-commerce and social media platforms.

Key Innovations

The paper introduces several innovations:

  1. Unified Embedding Framework: The framework accommodates both transductive and inductive learning models—denoted as -T and -I, respectively—providing flexibility in handling networks with partially observed data.
  2. Self-attention Mechanism: By utilizing a self-attention mechanism, the model differentiates varying influences of distinct edge types, thereby enhancing the discriminative power of the embeddings.
  3. Scalable Learning Algorithms: The algorithms are designed to efficiently process networks with substantial size, which is a significant advancement over existing methods that struggle with scalability.
  4. Performance Evaluation: The experimental results assert significant improvements in the link prediction task across various datasets, including those from Amazon, YouTube, Twitter, and Alibaba. The proposed framework demonstrated a substantial lift in performance metrics, such as F1 scores, compared to state-of-the-art models.

Implications and Future Directions

The implications of this research extend to both theoretical and practical domains. Theoretically, the framework offers a deeper understanding of how multiplex heterogeneous networks can be efficiently embedded, leveraging both attribute information and complex network structures. Practically, the successful deployment in Alibaba's recommendation system underscores the framework's effectiveness and scalability in industrial applications.

Future research could explore the integration of dynamic network components, enriching the model's capacity to handle network changes over time. Additional attention could be devoted to further optimizing the framework for real-time applications, where computational efficiency is paramount.

In conclusion, the contribution of this paper lies in bridging theoretical advances and practical requirements, presenting a robust and scalable approach to network representation learning with broad implications for various fields relying on networked data.