Compressing Graphs and Indexes with Recursive Graph Bisection (1602.08820v1)

Published 29 Feb 2016 in cs.DS and cs.SI

Abstract: Graph reordering is a powerful technique to increase the locality of the representations of graphs, which can be helpful in several applications. We study how the technique can be used to improve compression of graphs and inverted indexes. We extend the recent theoretical model of Chierichetti et al. (KDD 2009) for graph compression, and show how it can be employed for compression-friendly reordering of social networks and web graphs and for assigning document identifiers in inverted indexes. We design and implement a novel theoretically sound reordering algorithm that is based on recursive graph bisection. Our experiments show a significant improvement of the compression rate of graph and indexes over existing heuristics. The new method is relatively simple and allows efficient parallel and distributed implementations, which is demonstrated on graphs with billions of vertices and hundreds of billions of edges.

Citations (105)

View on Semantic Scholar

Summary

The paper introduces a novel recursive graph bisection algorithm that improves graph and index compression by optimizing data locality for delta-encoding.
It unifies the tasks of graph reordering and document identifier assignment while proving the NP-hard nature of achieving compression-friendly orders.
Experimental results on large datasets show that the method outperforms state-of-the-art techniques by reducing storage requirements and enabling efficient parallel processing.

Summary of "Compressing Graphs and Indexes with Recursive Graph Bisection"

The paper "Compressing Graphs and Indexes with Recursive Graph Bisection" addresses the challenge of efficiently compressing graph data and inverted indexes by leveraging a theoretically substantiated approach to graph reordering. Recognizing the pivotal role graph reordering plays in enhancing data locality for compression purposes, the authors propose a recursive graph bisection technique that both refines existing theoretical models and challenges prevalent heuristics in graph compression.

Theoretical Model and Algorithm

The authors extend the graph compression model initially proposed by Chierichetti et al. in 2009. In doing so, they provide a unified framework capable of addressing both graph reordering and document identifier assignment problems. By proving that the optimization of compression-friendly orders is NP-hard, they underline the complexity inherent in this model and juxtapose this problem against the Minimum Linear Arrangement (MLA).

Their contributions include a novel algorithm that employs recursive graph bisection, leveraging approximation algorithms to manage the NP-hard nature of the task. The recursive approach allows for efficient partitioning of large graphs into more manageable segments, optimizing them in parallel. This technique focuses on reducing the number of bits necessary to store graph and index data, specifically targeting delta-encoding, which relies on compressing the differences between consecutive graph elements.

Practical Implications and Results

From a practical standpoint, their experiments cover compression on large-scale graph datasets, including social networks and web graphs, demonstrating scalability and efficiency. By outperforming the state-of-the-art methods, their algorithm successfully reduces the compression rate, achieving marked improvements on graphs comprised of billions of vertices and edges.

The algorithm's simplicity extends its utility to parallel and distributed computing environments, leveraging frameworks such as Giraph for distributed implementations. This quality makes it appealing for applications needing rapid processing of large datasets, reflecting in machine-hour reductions.

Broad Implications and Future Work

The implications of this work traverse both theoretical and practical domains. On a theoretical level, the unified model offers a refined lens for analyzing graph compression problems, suggesting avenues for future research in approximation algorithms. Practically, the method promises enhanced data locality, impacting various applications reliant on graph traversal algorithms. Additionally, the potential for applying these reordering techniques to other domains, such as cache optimization and memory utilization in systems dealing with graph data, is substantial.

Further exploration might center on refining the vertex arrangement by considering factors beyond recursive bisection, specifically optimizing tree orientations—a challenge the authors hint at in relation to MLA. This paper lays a robust foundation for future investigations probing into more efficient graph and index compression strategies yielding tangible benefits in computational resources and time.

PDF Markdown

Related Papers

Tweets

https://twitter.com/grandiopanda/status/1806039739146907658

HackerNews

Compressing graphs and indexes with recursive graph bisection (2016) (16 points, 6 comments)