Attributed Graph Clustering: A Deep Attentional Embedding Approach (1906.06532v1)

Published 15 Jun 2019 in cs.LG and stat.ML

Abstract: Graph clustering is a fundamental task which discovers communities or groups in networks. Recent studies have mostly focused on developing deep learning approaches to learn a compact graph embedding, upon which classic clustering methods like k-means or spectral clustering algorithms are applied. These two-step frameworks are difficult to manipulate and usually lead to suboptimal performance, mainly because the graph embedding is not goal-directed, i.e., designed for the specific clustering task. In this paper, we propose a goal-directed deep learning approach, Deep Attentional Embedded Graph Clustering (DAEGC for short). Our method focuses on attributed graphs to sufficiently explore the two sides of information in graphs. By employing an attention network to capture the importance of the neighboring nodes to a target node, our DAEGC algorithm encodes the topological structure and node content in a graph to a compact representation, on which an inner product decoder is trained to reconstruct the graph structure. Furthermore, soft labels from the graph embedding itself are generated to supervise a self-training graph clustering process, which iteratively refines the clustering results. The self-training process is jointly learned and optimized with the graph embedding in a unified framework, to mutually benefit both components. Experimental results compared with state-of-the-art algorithms demonstrate the superiority of our method.

Citations (438)

View on Semantic Scholar

Summary

The paper proposes a unified deep learning approach that jointly optimizes graph embedding and clustering for attributed graphs.
It employs a graph attentional autoencoder with self-training to capture both node features and network structure.
Experimental results on citation networks show improved clustering accuracy and normalized mutual information over state-of-the-art methods.

Attributed Graph Clustering: A Deep Attentional Embedding Approach

The domain of graph clustering has garnered substantial attention within the scientific community, elucidating various methods to identify communities or groups within networks. The paper "Attributed Graph Clustering: A Deep Attentional Embedding Approach" contributes to this ongoing research by introducing a deep learning framework, Deep Attentional Embedded Graph Clustering (DAEGC), specifically crafted for attributed graphs.

Traditional methods, often bifurcated into two-step frameworks, involve separate stages of graph embedding and the application of clustering algorithms like $k$ -means or spectral clustering. These stages, however, are criticized for their lack of integration, leading to suboptimal performance as the embedding process is not inherently guided by the clustering objective. Recognizing this shortcoming, the authors propose a unified, goal-directed approach that simultaneously addresses both embedding and clustering.

Methodological Framework

DAEGC utilizes a graph attentional autoencoder to learn representations intrinsically aligned with the clustering task. This method captures both node content and topological structure. The innovation lies in using an attention network, capable of determining the significance of neighboring nodes in rebuilding a target node. Consequently, the autoencoder encodes a graph into compact representations, further refined through an inner product decoder focusing on graph structure reconstruction.

Central to the framework is a self-training graph clustering methodology, leveraging the embeddings to produce preliminary clustering soft labels. These labels are employed as supervising elements in a continuous training regime, orchestrating an iterative refinement process that not only optimizes clustering outcomes but also enhances the embedding's quality.

Experimental Evaluation

The authors validate their approach using benchmark datasets typical for graph analysis tasks, such as citation networks. Empirical evaluations reveal DAEGC's superiority over existing state-of-the-art methods, particularly highlighting improvements in clustering metrics like accuracy, normalized mutual information, and others. Conditions maintained for these experiments reflect realistic data scenarios where both structural and content-based information are imperative for precise embedding.

Theoretical and Practical Implications

Theoretically, this approach amalgamates graph attention networks and clustering in a coherent model, addressing both representation and task consistency. Practically, the framework presents a scalable solution for real-world graph datasets, extending its utility across fields like social network analysis, bioinformatics, and recommendation systems where attributed graph data is prevalent.

Future Directions

Future research may explore the enhancement of the attention mechanism with more sophisticated models or adapting the framework to dynamic or evolving graphs — scenarios common in streaming data applications. Additionally, extending this methodology to various types of graphs, such as heterogenous or multi-layered graphs, could yield substantial insights and advancements in the clustering domain.

In conclusion, DAEGC represents a methodical enhancement in the attributed graph clustering sphere, proposing a unified model that significantly bridges the gap between representation learning and task-specific clustering desiderata.

PDF Markdown