Empirical Comparison of Algorithms for Network Community Detection (1004.3539v1)

Published 20 Apr 2010 in cs.DS and physics.soc-ph

Abstract: Detecting clusters or communities in large real-world graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that "look like" good communities for the application of interest. In this paper, we explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objective and asking for an approximation to the best cluster of any size, we consider a size-resolved version of the optimization problem. Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have non-obvious size-dependent behavior.

Citations (1,053)

View on Semantic Scholar

Summary

The paper empirically compares different community detection algorithms across 40+ networks, highlighting systematic biases in community quality metrics.
It evaluates 12 objective functions and 8 algorithm classes, revealing that small clusters often achieve higher quality than larger ones.
The study establishes theoretical lower bounds and shows that heuristic methods like Local Spectral effectively approximate optimal clustering despite NP-hard challenges.

Empirical Comparison of Algorithms for Network Community Detection

The paper "Empirical Comparison of Algorithms for Network Community Detection" by Leskovec, Lang, and Mahoney investigates methods for identifying clusters or communities within large, real-world graphs such as social, web, and biological networks. The research aims to compare various community detection algorithms and understand their performance and systematic biases.

Key Insights and Methodology

The authors explore several objective functions commonly used to define network communities. These functions capture the idea of a community as a set of nodes with stronger internal connections than external ones. Given that optimizing these objective functions is generally NP-hard, the paper evaluates various heuristic and approximation algorithms designed to approach the optimal solution.

The paper includes a comprehensive comparison involving:

More than 40 diverse networks
12 objective functions to measure community quality
8 different classes of community detection algorithms

Community Detection Algorithms and Heuristic Methods

The core of the analysis focuses on both well-grounded and heuristic solutions:

Flow and Spectral Methods:
- Local Spectral Partitioning: Based on PageRank vectors, this method consistently finds connected clusters but with relatively worse conductance scores, indicating internal compactness but less separation from the rest of the network.
- Metis+MQI: This heuristic, combining the Metis graph partitioning tool and the MQI flow-based method, tends to find better-separated clusters but at the expense of internal compactness, sometimes resulting in clusters that are internally disconnected.
Heuristic Algorithms:
- Leighton-Rao Multicommodity Flow: This method works well for small to medium-sized clusters but struggles with large graphs containing expander-like cores.
- Graclus and Newman's Dendrogram: Both algorithms display qualitative clustering outcomes similar to the Local Spectral method, reinforcing that approximate local spectral clustering can be computationally cheaper and similarly effective.

Evaluation of Objective Functions

The authors perform detailed experiments to assess different community quality scores:

Multi-criterion Scores: These include conductance, expansion, internal density, and various ODF-based metrics. Despite differences, they generally show similar patterns where small clusters are well-defined but quality degrades as cluster size increases.
Single-criterion Scores: Metrics like modularity exhibit distinctive behaviors, increasing monotonically towards bisection of the network. This highlights the underlying structure of specific network types.

Theoretical Bounds and Cluster Characteristics

To place empirical results in context, the paper calculates spectral and semidefinite programming (SDP) lower bounds on community quality metrics. These theoretical bounds provide crucial insights:

For many networks, particularly large ones, good clusters are small. Large clusters either do not exist or are qualitatively worse, as indicated by the difference between empirical upper bounds and theoretical lower bounds.
The consistent qualitative shape of Network Community Profiles (NCPs) across various detection algorithms and objective functions suggests that the observed patterns are intrinsic to the network's structure rather than artifacts of the algorithms.

Implications and Future Directions

The paper reveals several critical points:

Practical community detection algorithms perform robustly, closely approximating theoretical lower bounds and effectively identifying varied cluster sizes.
Approximate optimization, while introducing biases, can be beneficial. For instance, methods like Local Spectral produce more intuitive communities due to their compactness, akin to regularization techniques in machine learning.
Future research could further explore formalizing these concepts of regularization by approximate computation and assess their applicability across different network types and sizes.

In conclusion, this empirical comparison offers valuable insights into the effectiveness of various community detection algorithms in large networks. It highlights the importance of evaluating algorithmic performance based on both theoretical benchmarks and practical outcomes, paving the way for more accurate and efficient community detection techniques.

PDF Markdown