Dynamic Group Scaling (DGS) in Elastic Graph Partitioning

Updated 4 July 2026

Dynamic Group Scaling (DGS) is a method for reconfiguring static graph partitions on elastic infrastructures by efficiently adjusting the number of partitions.
It employs a two-stage approach: a one-time graph edge ordering to encode locality, followed by rapid chunk-based edge partitioning for constant-time reslicing.
Empirical evaluations show DGS achieves near constant-time repartitioning with high partition quality and reduced communication overhead compared to traditional methods.

Searching arXiv for recent and related uses of “Dynamic Group Scaling” / “DGS” to ground terminology and disambiguation. arxiv_search query="Dynamic Group Scaling OR graph dynamic scaling OR DGS", max_results=10 arxiv_search({"query":"Dynamic Group Scaling OR graph dynamic scaling OR DGS","max_results":10}) Dynamic Group Scaling (DGS), termed graph dynamic scaling in distributed graph processing, denotes the problem of changing the number of graph partitions when elastic infrastructure provisions or de-provisions resources, while preserving high partitioning quality and avoiding the cost of rerunning a full partitioner at every scale event (Hanai et al., 2021). In this formulation, the graph itself is static, but the available compute units change over time. The principal DGS method separates the problem into a one-time preprocessing stage, graph edge ordering, and a very fast repartitioning stage, chunk-based edge partitioning, so that scaling from $k$ to $k \pm x$ partitions can be performed without re-executing an expensive high-quality partitioning algorithm (Hanai et al., 2021).

1. Elastic repartitioning as the core DGS problem

The DGS setting assumes a distributed graph-processing system on elastic infrastructure such as cloud VMs, CPU cores, or sockets. The graph structure is fixed, but the partition count must change as resources scale out or scale in. Formally, dynamic scaling is defined as

$sc(\mathcal{E}_k, \pm x) = \mathcal{E}_{k\pm x},$

where $\mathcal{E}_k$ is the current $k$ -way edge partitioning (Hanai et al., 2021).

The objective is explicitly multi-objective: maximize scaling efficiency while minimizing the replication factor after repartitioning, subject to balance: $\max_{\mathit{sc} \in \mathcal{SC}} EF(\mathit{sc}(\mathcal{E}_k,\pm x)), \qquad \min_{\mathit{sc} \in \mathcal{SC}} RF(sc(\mathcal{E}_k,\pm x))$ subject to

$\max_{0 \le p < k\pm x} |\mathcal{E}_{k\pm x}[p]| < (1+\epsilon)\frac{|E|}{k\pm x}.$

This formulation captures the central tension emphasized by the literature: high-quality partitioning methods such as METIS/NE are expensive, whereas fast methods such as hashing are cheap but produce poor locality and higher replication factor (Hanai et al., 2021).

In operational terms, DGS is needed because leaving the partition count unchanged under elastic resource variation produces underutilization or overload and increases communication. The problem is therefore not merely repartitioning in the abstract; it is repartitioning under time pressure, with quality requirements close to those of static high-quality methods (Hanai et al., 2021).

2. Graph edge ordering as locality-preserving preprocessing

The first half of the DGS method is a preprocessing stage called graph edge ordering. Its purpose is to produce an ordering of the edge set in which edges with high locality are close in the linear order. Let

$\phi: E \mapsto \{0,\dots,|E|-1\}$

be a bijective ordering function, and let $E^\phi$ denote the ordered edge list. The design goal is to choose $\phi$ so that, for a range of future partition counts from $k \pm x$ 0 to $k \pm x$ 1, contiguous chunks of $k \pm x$ 2 induce low replication factor when used as partitions (Hanai et al., 2021).

The paper proves that the edge-ordering objective is NP-hard. It therefore proposes a greedy approximation. The baseline greedy expansion starts from a random vertex, orders its incident edges, repeatedly chooses a frontier vertex that locally minimizes the objective, and expands the ordered region by adding one-hop and selected two-hop edges (Hanai et al., 2021). To reduce the cost of this procedure, the method introduces a priority-queue acceleration with vertex priority

$k \pm x$ 3

where $k \pm x$ 4 is the number of unprocessed incident edges of $k \pm x$ 5, $k \pm x$ 6 is the latest order index of an edge involving $k \pm x$ 7,

$k \pm x$ 8

The paper proves a lemma that this priority order is consistent with the greedy objective under the assumption that $k \pm x$ 9 is much larger than $sc(\mathcal{E}_k, \pm x) = \mathcal{E}_{k\pm x},$ 0 (Hanai et al., 2021).

This preprocessing stage is the expensive part of the pipeline, but it is performed only once. Its function is to encode locality into the edge order so that later scaling events become simple reslicing operations rather than full repartitioning runs. A plausible implication is that DGS is best understood as a separation of concerns: expensive locality construction is front-loaded, and later elasticity is handled by manipulating the ordered representation.

3. Chunk-based edge partitioning and instantaneous reslicing

The second half of the method is chunk-based edge partitioning. Once the ordered edge list $sc(\mathcal{E}_k, \pm x) = \mathcal{E}_{k\pm x},$ 1 has been constructed, the graph is partitioned by taking contiguous chunks. Define

$sc(\mathcal{E}_k, \pm x) = \mathcal{E}_{k\pm x},$ 2

Then, for $sc(\mathcal{E}_k, \pm x) = \mathcal{E}_{k\pm x},$ 3 partitions, partition $sc(\mathcal{E}_k, \pm x) = \mathcal{E}_{k\pm x},$ 4 is

$sc(\mathcal{E}_k, \pm x) = \mathcal{E}_{k\pm x},$ 5

If $sc(\mathcal{E}_k, \pm x) = \mathcal{E}_{k\pm x},$ 6, this simplifies to

$sc(\mathcal{E}_k, \pm x) = \mathcal{E}_{k\pm x},$ 7

These formulas are the mechanism by which the ordered edge list is converted into an arbitrary $sc(\mathcal{E}_k, \pm x) = \mathcal{E}_{k\pm x},$ 8-way partitioning (Hanai et al., 2021).

The central efficiency theorem states that if the ordered edges are stored contiguously, chunk-based partitioning can be computed in

$sc(\mathcal{E}_k, \pm x) = \mathcal{E}_{k\pm x},$ 9

excluding graph data movement. The practical reason is straightforward: only array or file offsets need to be computed, and the partitioner does not revisit edges individually (Hanai et al., 2021).

The same construction yields a strong balance property. Chunk sizes differ by at most $\mathcal{E}_k$ 0, so edge balance is essentially perfect and $\mathcal{E}_k$ 1 in practice (Hanai et al., 2021). DGS therefore turns scaling into a reslicing operation with automatic balance, provided that the earlier preprocessing has already arranged locality favorably.

4. Replication factor, complexity, and theoretical bounds

Partition quality is measured by the standard replication factor: $\mathcal{E}_k$ 2 Each partition stores the vertices touched by its edges, so a vertex that appears in multiple partitions is replicated. Lower $\mathcal{E}_k$ 3 means less communication (Hanai et al., 2021). Balanced $\mathcal{E}_k$ 4-way edge partitioning is defined as minimizing $\mathcal{E}_k$ 5 subject to the edge-balance constraint

$\mathcal{E}_k$ 6

The complexity contrast between the two ordering algorithms is large. The fast greedy algorithm runs in

$\mathcal{E}_k$ 7

where $\mathcal{E}_k$ 8 is the maximum degree, whereas the baseline greedy algorithm is

$\mathcal{E}_k$ 9

under the stated assumptions (Hanai et al., 2021). The intended use is therefore clear: pay once for a feasible locality-aware ordering, then exploit constant-time repartitioning thereafter.

The paper also provides a replication-factor upper bound. Under the assumption that each greedy iteration adds fewer new edges than the smallest partition size, and with

$k$ 0

the resulting partitioning satisfies

$k$ 1

For power-law graphs, the expected value is derived as

$k$ 2

The paper states that this bound is close to NE, the best static method, and much better than hashing-based dynamic scaling (Hanai et al., 2021).

Scaling is not free because data migration remains necessary. For scaling from $k$ 3 to $k$ 4, the paper derives the number of migrated edges as

$k$ 5

For the common case $k$ 6,

$k$ 7

The distinction is important: DGS makes the partitioning computation constant-time, but it does not eliminate the cost of moving graph data between partitions (Hanai et al., 2021).

5. Empirical behavior on large graphs and graph analytics

The empirical evaluation uses real-world large graphs including Road-CA, Skitter, Patents, Pokec, Flickr, LiveJournal, Orkut, Twitter, and Friendster, including billion-edge graphs. The compared dynamic scaling and partitioning baselines are BVC, NE, DBH, HDRF, 1D / 2D hashing, MTS, and CVP; the compared ordering baselines are GO, RO, RGB, LLP, RCM, DEG, and DEF. The proposed system is denoted GEO+CEP, combining graph edge ordering (GEO) with chunk-based edge partitioning (CEP) (Hanai et al., 2021).

The headline result is speed. CEP is reported to be over 1,000× faster than the other partitioners in many cases and, more generally, 3 to 8 orders of magnitude faster than existing methods in dynamic scaling settings (Hanai et al., 2021). The reason given is structural rather than implementation-specific: CEP computes chunk boundaries, whereas competing methods process edges individually.

Quality is reported to remain high. GEO+CEP achieves second-best replication factor, just behind NE, and much better quality than hash-based methods such as BVC, DBH, 1D, and 2D. The paper emphasizes that the gap to NE is small even though GEO+CEP supports arbitrary $k$ 8 while NE is tied to a fixed $k$ 9 (Hanai et al., 2021).

The application-level evaluation covers SSSP, WCC, and PageRank on PowerLyra/PowerGraph-like infrastructure. GEO+CEP reduces communication volume substantially and yields the best runtime in most cases, with the strongest improvement on PageRank. The reported PageRank examples are explicit: on Orkut, GEO+CEP reduces time from $\max_{\mathit{sc} \in \mathcal{SC}} EF(\mathit{sc}(\mathcal{E}_k,\pm x)), \qquad \min_{\mathit{sc} \in \mathcal{SC}} RF(sc(\mathcal{E}_k,\pm x))$ 0s with 1D to $\max_{\mathit{sc} \in \mathcal{SC}} EF(\mathit{sc}(\mathcal{E}_k,\pm x)), \qquad \min_{\mathit{sc} \in \mathcal{SC}} RF(sc(\mathcal{E}_k,\pm x))$ 1s; on Twitter, from $\max_{\mathit{sc} \in \mathcal{SC}} EF(\mathit{sc}(\mathcal{E}_k,\pm x)), \qquad \min_{\mathit{sc} \in \mathcal{SC}} RF(sc(\mathcal{E}_k,\pm x))$ 2s to $\max_{\mathit{sc} \in \mathcal{SC}} EF(\mathit{sc}(\mathcal{E}_k,\pm x)), \qquad \min_{\mathit{sc} \in \mathcal{SC}} RF(sc(\mathcal{E}_k,\pm x))$ 3s; and on Friendster, from $\max_{\mathit{sc} \in \mathcal{SC}} EF(\mathit{sc}(\mathcal{E}_k,\pm x)), \qquad \min_{\mathit{sc} \in \mathcal{SC}} RF(sc(\mathcal{E}_k,\pm x))$ 4s to $\max_{\mathit{sc} \in \mathcal{SC}} EF(\mathit{sc}(\mathcal{E}_k,\pm x)), \qquad \min_{\mathit{sc} \in \mathcal{SC}} RF(sc(\mathcal{E}_k,\pm x))$ 5s (Hanai et al., 2021).

The end-to-end dynamic scaling evaluation reports similarly large gains. On Friendster, Scale-out ALL drops from $\max_{\mathit{sc} \in \mathcal{SC}} EF(\mathit{sc}(\mathcal{E}_k,\pm x)), \qquad \min_{\mathit{sc} \in \mathcal{SC}} RF(sc(\mathcal{E}_k,\pm x))$ 6s with 1D to $\max_{\mathit{sc} \in \mathcal{SC}} EF(\mathit{sc}(\mathcal{E}_k,\pm x)), \qquad \min_{\mathit{sc} \in \mathcal{SC}} RF(sc(\mathcal{E}_k,\pm x))$ 7s with GEO+CEP, and Scale-in ALL drops from $\max_{\mathit{sc} \in \mathcal{SC}} EF(\mathit{sc}(\mathcal{E}_k,\pm x)), \qquad \min_{\mathit{sc} \in \mathcal{SC}} RF(sc(\mathcal{E}_k,\pm x))$ 8s to $\max_{\mathit{sc} \in \mathcal{SC}} EF(\mathit{sc}(\mathcal{E}_k,\pm x)), \qquad \min_{\mathit{sc} \in \mathcal{SC}} RF(sc(\mathcal{E}_k,\pm x))$ 9s. The paper attributes these improvements to faster initial partitioning, much faster repartitioning, and lower communication during the application phase (Hanai et al., 2021).

6. Scope, limitations, and terminological ambiguity

DGS in this sense is designed for a specific regime. The graph is assumed static during scaling; the workload changes because available resources change, not because graph topology changes. Real-world evaluation targets large and sparse graphs, often power-law-like. For formal derivations the graph is undirected and unweighted, several proofs assume $\max_{0 \le p < k\pm x} |\mathcal{E}_{k\pm x}[p]| < (1+\epsilon)\frac{|E|}{k\pm x}.$ 0 is much larger than $\max_{0 \le p < k\pm x} |\mathcal{E}_{k\pm x}[p]| < (1+\epsilon)\frac{|E|}{k\pm x}.$ 1, and the preprocessing implementation is sequential in the reported system. The intended operating range is described as

$\max_{0 \le p < k\pm x} |\mathcal{E}_{k\pm x}[p]| < (1+\epsilon)\frac{|E|}{k\pm x}.$ 2

The paper also notes several trade-offs: preprocessing is not free, migration is still required even though partitioning is $\max_{0 \le p < k\pm x} |\mathcal{E}_{k\pm x}[p]| < (1+\epsilon)\frac{|E|}{k\pm x}.$ 3, and vertex balance may be slightly worse because the method prioritizes edge balance and communication reduction (Hanai et al., 2021).

A common misconception is that DGS is a stable acronym across arXiv literature. It is not. The acronym is used for several unrelated concepts.

Use of “DGS”	Meaning	Representative paper
Dynamic Group Scaling	Graph dynamic scaling for elastic repartitioning	(Hanai et al., 2021)
Dynamic graph storage	In-memory storage for concurrent graph read/write queries	(Su et al., 16 Feb 2025)
Dual-way gradient sparsification	Asynchronous distributed training with sparsified up/down communication	(Yan, 2019)
Determined by generalized spectrum	Spectral graph-theoretic property and graph constructions	(Wang et al., 4 Jan 2026)
Dynamic grayscale snippets	Self-supervised motion representation for face PAD	(Muhammad et al., 2022)

The distinction matters because later work may use the same acronym while addressing wholly different problems. The dynamic graph storage literature explicitly states that DGS means dynamic graph storage and does not refer to dynamic group scaling (Su et al., 16 Feb 2025). A plausible implication is that references to “DGS” require immediate contextual disambiguation rather than acronym-level interpretation.

A broader scaling pattern nevertheless appears across adjacent work. Scalable dynamic graph learning methods such as ScaDyG move expensive historical propagation into a one-time preprocessing stage and reserve trainable adaptation for a lightweight fusion step, a structure that conceptually overlaps with the DGS strategy of front-loading expensive organization and making later scaling operations cheap (Wu et al., 27 Jan 2025). In that restricted sense, DGS represents a recurring systems principle: compute reusable structure once, then amortize elastic reconfiguration over many subsequent operations.