Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks (1011.5425v2)

Published 24 Nov 2010 in cs.DS, cs.SI, and physics.soc-ph

Abstract: We continue the line of research on graph compression started with WebGraph, but we move our focus to the compression of social networks in a proper sense (e.g., LiveJournal): the approaches that have been used for a long time to compress web graphs rely on a specific ordering of the nodes (lexicographical URL ordering) whose extension to general social networks is not trivial. In this paper, we propose a solution that mixes clusterings and orders, and devise a new algorithm, called Layered Label Propagation, that builds on previous work on scalable clustering and can be used to reorder very large graphs (billions of nodes). Our implementation uses overdecomposition to perform aggressively on multi-core architecture, making it possible to reorder graphs of more than 600 millions nodes in a few hours. Experiments performed on a wide array of web graphs and social networks show that combining the order produced by the proposed algorithm with the WebGraph compression framework provides a major increase in compression with respect to all currently known techniques, both on web graphs and on social networks. These improvements make it possible to analyse in main memory significantly larger graphs.

Citations (644)

View on Semantic Scholar

Summary

The paper introduces a novel multi-resolution label propagation algorithm (LLP) that reorders massive graphs to enhance compression performance.
Experimental results demonstrate that LLP achieves superior compression, reaching as low as 1.8 bits per link, outperforming BFS, Shingle, and Gray orderings.
The coordinate-free approach of LLP enables effective clustering without reliance on pre-existing node order, making it ideal for social networks and web graphs.

An Analysis of "Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks"

The paper "Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks" by Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna presents an innovative strategy for compressing large-scale social networks and web graphs. The authors develop a novel algorithm, Layered Label Propagation (LLP), which improves upon traditional methods by leveraging a multi-resolution approach to graph clustering and ordering.

Key Contributions

Layered Label Propagation Algorithm: The paper introduces LLP, which builds on scalable clustering techniques such as label propagation. This algorithm successfully reorders extremely large graphs — with billions of nodes — ensuring substantial improvements in compression.
Compression Performance: Experimental results demonstrate that the LLP algorithm, combined with the WebGraph framework, achieves superior compression ratios compared to existing methods. This holds true for both social networks and web graphs, exemplified by datasets containing over 600 million nodes.
Coordinate-Free Approach: LLP is identified as a coordinate-free compression method, ensuring optimal performance regardless of the initial ordering of the graph nodes. This characteristic makes LLP particularly suitable for social networks, where node ordering such as lexicographical URL sorting is not applicable.

Methodological Insights

The authors adopt a clustering method based on label propagation, refining it through multi-resolution iterations. By running the Absolute Pott Model (APM) across different resolution levels and combining results, LLP efficiently identifies clustering structures that maximize data locality and similarity — essential factors for effective compression.

They leverage a parallel implementation strategy, significantly enhancing the algorithm's performance on multi-core systems. This approach enables practical application on vast networks, facilitating compression to bits per link values that outperform BFS-based coordinate-free methods.

Experimental Validation

The authors benchmark LLP against several existing orderings, including BFS, Shingle, and Gray ordering, underlining its robustness and effectiveness even from a purely random node permutation. When compressed using the BV method within the WebGraph framework, datasets such as the uk web graph achieve compression to as low as 1.8 bits per link.

Moreover, the paper contrasts LLP with the compression scheme by Apostolico and Drovandi. The results show that LLP yields better compression, particularly on social networks where previous methods struggle with the absence of natural ordering information.

Implications and Future Work

The approach proposed in this paper enhances both theoretical understanding and practical capabilities in handling large-scale network data. The insights on locality and similarity open avenues for further research in network topology's impact on compression. Additionally, the multiscale clustering strategy employed by LLP could inform future development in graph data structures and algorithms beyond compression, potentially benefiting analyses in fields such as network analysis and community detection.

Future developments may involve exploring further optimizations in the multiresolution strategy or extending the algorithm to accommodate dynamic networks where node interactions evolve over time.

In conclusion, the paper provides a significant advancement in compressing social network and web graph data, demonstrating the efficacy of coordinate-free methods in achieving high compression ratios while maintaining fast access speeds. This work sets a new benchmark for future research in graph compression.

PDF Markdown