Sparsifiner: Efficient Graph Sparsification
- Sparsifiner is a framework for efficiently simulating randomized distributed graph algorithms by transforming dense graphs into sparse subgraph representations.
- It employs structured sparsification to reduce local dependencies, achieving near-optimal round complexity and sublinear memory in MPC and LCA models.
- The framework demonstrates practical improvements in solving MIS, matching, and vertex cover by compressing multi-round interactions into manageable sparse subgraphs.
Sparsifiner is a framework for the efficient simulation and locality reduction of randomized distributed algorithms in large-scale graph processing, particularly in the context of Maximal Independent Set (MIS), matching, and vertex cover problems. The central technique is a structured sparsification transformation: instead of simulating every round of a -round LOCAL algorithm over the dense input topology, the algorithm performs computations on a sequence of carefully constructed sparse subgraphs, each representing a superset of the local dependencies over contiguous rounds. By leveraging sparsification, Sparsifiner simultaneously achieves near-optimal round complexity and sublinear memory or query complexity in Massively Parallel Computation (MPC) and Local Computation Algorithms (LCA) models, breaking established complexity barriers for these tasks (Ghaffari et al., 2018).
1. Sparsification Transformation for LOCAL and Parallel Algorithms
Sparsifiner divides the execution of a %%%%1%%%%-round LOCAL algorithm on an -node graph (with maximum degree ) into phases of rounds each (where for MIS and matching problems). In each phase , it constructs a sparse subgraph , sampling edges or nodes based on the probabilistic choices made by the original algorithm in those rounds.
Matching-Approximation Example:
- In iteration , each edge is marked independently with probability .
- Using , let , and contains each edge with probability .
- The union results in a subgraph with , significantly reducing the number of neighbors each node must inspect (Ghaffari et al., 2018).
MIS Example:
- For each node , a vector of i.i.d. uniform random numbers is fixed per iteration.
- Nodes are considered “relevant” for the sparsified subgraph if they or their neighbors have high-probability local events in the phase.
- Nodes are classified as “light” or “heavy” according to degree, and “good” if degree estimates remain below a threshold ().
- The resulting subgraph allows deterministic simulation of the original rounds by examining only the -hop neighborhood in , with and -ball size for sufficiently small (Ghaffari et al., 2018).
2. Algorithmic Workflows: Pseudocode and Simulation Strategy
At a high level, Sparsifiner proceeds in phases, each consisting of rounds:
- Initialization: Set marking probabilities for all .
- Phase (): For rounds,
- If degree , a node “stalls” (probabilities are halved every round).
- Sparsified is constructed via local criteria.
- Each node gathers its -hop neighborhood in and simulates rounds:
- Degree estimates are obtained with samples.
- Probabilities are updated accordingly.
- Marking/selection events are performed as in the original, but using only the sparsified local data.
- Stitching: After all phases, high-degree nodes are removed in a cleanup round.
This method allows nodes to determine, with high probability, their MIS or matching status by querying only a polylogarithmic-size local neighborhood in the sparsified subgraph, as opposed to the exponentially large neighborhood in the original (Ghaffari et al., 2018).
3. Complexity Results and Barrier Separation
Sparsifiner yields the following advances:
- LOCAL Model: After rounds, with probability , all nodes are either in the MIS or have a neighbor in the MIS. The remaining subgraph has components of size and at most surviving nodes. The round complexity matches prior optimal results (Ghaffari et al., 2018).
- MPC Model: For any , an MPC algorithm with memory per machine and machines solves MIS, Maximal Matching, -Max-Matching, or $2$-approximates Min-VC in rounds. The sparsification allows LOCAL rounds to be “compressed” into MPC rounds using graph-exponentiation, as long as the local -ball fits in memory (Ghaffari et al., 2018).
- LCA Model: There is an LCA for MIS with query complexity . This is achieved by recursively splitting the rounds into halved-length subphases down to , and the sparsified subgraph is sufficiently small for local exploration and simulation, circumventing the query lower bound of classic approaches (Ghaffari et al., 2018).
4. Technical Innovations: Locality-Volume and Simulation Efficiency
Key innovations underlying Sparsifiner include:
- Locality-Volume: The relevant measure is not the raw -hop neighborhood size (), but the number of graph elements each node truly depends on in the sparsification. Oversampling creates supersets of “relevant” neighbors, greatly reducing simulation volume compared to Parnas–Ron–style approaches (Ghaffari et al., 2018).
- Degree Stalling and Adaptive Sampling: Nodes with intractably high degree stall their participation, maintaining sparsity in without hindering global progress.
- MPC Graph Exponentiation: The -hop simulation, enabled by bounded -ball size in , is mapped efficiently across machines, aggregating neighborhoods in MPC rounds.
- Recursive LCA Simulation: Subphases are simulatable in queries, and the recurrence for overall queries solves to (Ghaffari et al., 2018).
5. Concrete Example: Matching Approximation with Sparsifiner
In a basic LOCAL matching algorithm, nodes mark incident edges by increasing probabilities, isolated marked edges are added to the matching, and high-degree endpoints are deleted. With Sparsifiner:
- Phases of iterations use elevated sampling ().
- has with high probability.
- Nodes resample incident -edges to emulate marking events, simulate isolation, and execute matching, all within the sparsified locality.
- Guarantees: locality-volume, rounds, and a constant fraction of removed nodes matched per iteration (Ghaffari et al., 2018).
6. Impact, Limitations, and Theoretical Significance
Sparsifiner achieves:
- The first sublogarithmic-round MPC algorithms for MIS, matching, and vertex cover that work with memory per machine, breaking the linear barrier.
- An LCA for MIS with query complexity , surpassing the barrier implied by distributed simulation lower bounds.
- Fundamental re-framing of locality measures for distributed and local simulation.
Limitations are primarily in the need to fix randomness ahead of time and to only sparsify those phases of the algorithm in which combinatorial dependencies can be bounded by randomized sampling and degree controls.
7. Summary Table of Model-Specific Improvements
| Model | Prior Complexity | Sparsifiner Complexity | Key Advance |
|---|---|---|---|
| LOCAL | Same round count, lower volume | ||
| MPC | rounds with memory | rounds, memory | Breaks linear memory barrier |
| LCA | queries | queries | Breaks query lower bound |
The Sparsifiner framework fundamentally advances parallel and local graph algorithms, providing both theoretical insights and practical schemes for sublinear-memory graph processing, while offering a general method for simulating global dependencies using only a small local view within sparse random subgraphs (Ghaffari et al., 2018).