Papers
Topics
Authors
Recent
2000 character limit reached

Universal and Asymptotically Optimal Data and Task Allocation in Distributed Computing

Published 9 Jan 2026 in cs.IT | (2601.05873v1)

Abstract: We study the joint minimization of communication and computation costs in distributed computing, where a master node coordinates $N$ workers to evaluate a function over a library of $n$ files. Assuming that the function is decomposed into an arbitrary subfunction set $\mathbf{X}$, with each subfunction depending on $d$ input files, renders our distributed computing problem into a $d$-uniform hypergraph edge partitioning problem wherein the edge set (subfunction set), defined by $d$-wise dependencies between vertices (files) must be partitioned across $N$ disjoint groups (workers). The aim is to design a file and subfunction allocation, corresponding to a partition of $\mathbf{X}$, that minimizes the communication cost $π{\mathbf{X}}$, representing the maximum number of distinct files per server, while also minimizing the computation cost $δ{\mathbf{X}}$ corresponding to a maximal worker subfunction load. For a broad range of parameters, we propose a deterministic allocation solution, the \emph{Interweaved-Cliques (IC) design}, whose information-theoretic-inspired interweaved clique structure simultaneously achieves order-optimal communication and computation costs, for a large class of decompositions $\mathbf{X}$. This optimality is derived from our achievability and converse bounds, which reveal -- under reasonable assumptions on the density of $\mathbf{X}$ -- that the optimal scaling of the communication cost takes the form $n/N{1/d}$, revealing that our design achieves the order-optimal \textit{partitioning gain} that scales as $N{1/d}$, while also achieving an order-optimal computation cost. Interestingly, this order optimality is achieved in a deterministic manner, and very importantly, it is achieved blindly from $\mathbf{X}$, therefore enabling multiple desired functions to be computed without reshuffling files.

Summary

  • The paper presents a universal interweaved clique (IC) design that achieves order-optimal, deterministic data and task allocation to minimize communication and computation loads.
  • It models allocation as a d-uniform hypergraph edge partitioning problem, achieving a communication scaling of O(n/N^(1/d)) and constant computation cost bounds.
  • The method supports scalable distributed computing by enabling efficient task assignment without file reshuffling, ideal for diverse multi-tenant scenarios.

Universal and Asymptotically Optimal Data and Task Allocation in Distributed Computing

Problem Formulation and Theoretical Framework

The paper addresses the joint minimization of communication and computation costs in a generic master–worker distributed computing framework. A master node with access to a dataset of nn files coordinates NN workers to perform function computations. Each function of interest can be decomposed into a set X\mathbf{X} of subfunctions, each depending on exactly dd files. This abstraction naturally reformulates the allocation of subfunctions and files as a dd-uniform hypergraph edge partitioning problem: files are vertices and subfunctions are edges, and the partitioning must assign subfunctions to workers while balancing communication and computational load.

The core objectives are:

  • Communication cost: Minimize πX=maxbW(b)\pi_{\mathbf{X}} = \max_{b} |\mathcal{W}^{(b)}|, i.e., the largest number of files sent to any worker.
  • Computation cost: Minimize δX=maxbΦb/X/N\delta_{\mathbf{X}} = \max_b |\mathbf{\Phi}_b| / \lceil |\mathbf{X}|/N\rceil, i.e., maximal overload relative to perfect task balance.

The formulation is universal, encompassing many practical scenarios such as covariance computation, kernel methods, contrastive loss evaluation, scientific simulation, and multi-way bioinformatic comparisons, each characterized by dd-wise data dependencies.

Relationship to Hypergraph Edge Partitioning and Prior Work

The edge partitioning of dd-uniform hypergraphs to minimize communication and computational loads is a classic, yet still computationally intractable, problem—NP-hard even for d=2d=2. Existing algorithmic partitions for graphs (e.g., ARF-minimizing methods, projective plane–based constructions) provide O(N)O(\sqrt{N}) performance for d=2d=2, but extensions to general dd and order-optimal, provable scaling laws have been lacking.

Notably, the paper’s communication cost metric, πX\pi_{\mathbf{X}}, directly quantifies the maximum load any communication link must sustain, which, unlike the average replication factor (ARF), remains operationally meaningful even when computational delay constraints are absent.

The Interweaved Clique (IC) Design: Construction and Properties

The main contribution is an explicit, deterministic, interweaved clique (IC)–based allocation construction. The design is universal: for any (n,d,N)(n, d, N) and any subfunction set XAn,d\mathbf{X} \subseteq \mathbf{A}_{n,d} of non-trivial density, the file allocation is fixed independently of X\mathbf{X}, supporting simultaneous or sequential computation of many functions without data reshuffling.

Construction Outline:

  1. Partition files into kk families, kk chosen so that N=(kd)N' = \binom{k}{d} is maximal with NNN' \leq N.
  2. Assign each worker a group of tasks (subfunctions) whose data support aligns with a unique combination of families—a “clique.” This exploits structure common in coded caching and coded computation.
  3. For general NN, the assignment is refined to fill NN groups by lexicographical splitting and redistribution.

Communication Cost Bound: For X\mathbf{X} of density φ\varphi, the IC design achieves

πX4enN1/d.\pi_{\mathbf{X}} \leq \frac{4e\, n}{N^{1/d}}.

The order-optimal partitioning gain is thus N1/dN^{1/d}.

Computation Cost Bound: For dn/32d \leq n/32, N(910n/d)dN \leq (\frac{9}{10} \sqrt{n/d})^d, and φlogn/nd/2\varphi \gtrsim \log n/n^{d/2}, with probability at least $1 - 1/n$, it holds that

δX5.\delta_{\mathbf{X}} \leq 5.

The design also achieves the lower bound for any decomposition:

πXφ1/dnN1/d.\pi_{\mathbf{X}}^\star \geq \frac{\varphi^{1/d} n}{N^{1/d}}.

Algorithmic Salience:

  • Deterministic, low-complexity construction
  • File allocation is fixed (independent of X\mathbf{X})—tasks are assigned according to subset intersections
  • Communication-optimal assignment without file reshuffling for different decompositions over the same library

Main Results: Bounds, Optimality, and Universality

The authors prove tight, matching achievability and converse results characterizing the minimal communication and computation costs for all sufficiently dense X\mathbf{X}. Explicit scaling laws include:

Quantity Achievable Scaling (IC design) Lower Bound Regime/note
Communication cost (π\pi) O(nN1/d)O\left(\frac{n}{N^{1/d}}\right) Ω(nN1/d)\Omega\left(\frac{n}{N^{1/d}}\right) Any fixed dd, φ\varphi
Computation cost (δ\delta) O(1)O(1) (specifically 5\leq 5) $1$ High probability
  • The IC design is shown to be order-optimal for all sufficiently large n,Nn,N and fixed d,φ>0d,\varphi>0; no partition can achieve fundamentally better communication scaling, even if the function is decomposed in the most favorable way.
  • The design is fully universal: optimal scaling is attained for all decompositions of any function (including those with variable dd or multiple admissible decompositions).
  • In the case of dense, clique-like X\mathbf{X} (a union of disjoint cliques), the minimal communication cost is achieved exactly.

These results apply directly to a large class of data analytics and ML problems, including but not limited to large-batch all-pairs/similarity computations, distributed kernel machines, and divide-and-conquer gradient workflows.

Comparison to Existing Algorithms

The ARF performance of the IC design matches or improves upon the best theoretical algorithmic guarantees available for graphs, and is broadly applicable to higher-order dd:

  • IC design (d=2d=2): ARF <2N< \sqrt{2N}
  • Projective plane constructions: ARF [1.5N,2N]\in [1.5 \sqrt{N}, 2\sqrt{N}] (for special NN)
  • ARF guarantees from leading algorithms \cite{Dynamic,TrillionEdges}: IC design performs better for typical dense-task settings or for NnN \lesssim n.
  • For d>2d>2, the IC design provides the first explicit construction with N1/dN^{1/d} scaling.

Practical and Theoretical Implications

From a practical perspective, the IC design allows for data placement to be planned “once,” with robust guarantees for arbitrary workloads over a fixed dataset, enabling high communication efficiency for multi-tenant analytics pipelines and federated or ensemble ML distributed across commodity clusters. The asynchronous operation and avoidance of file reshuffling are particularly valuable for elastic, serverless, or heterogenous compute environments.

From a theoretical perspective, the manuscript closes a key gap in the distributed computation and hypergraph partitioning literature: the explicit, deterministic, and tight partitioning gain for general dd, and with explicit achievability, is now characterized for all practical parameter ranges where the task set is not vanishingly sparse.

Conclusion

This work establishes universality and order-optimality for the joint data and task allocation problem in distributed computing, via an interweaved clique construction inspired by combinatorial and information-theoretic methods. The partitioning scheme provides worst-case optimal communication cost and constant-factor computation balancing for generic functions expressible via dense multi-way dependencies. Its design simplicity, lack of dependence on specific task sets, and tight scaling bounds make it directly deployable in large-scale, dynamic, heterogeneous, and multi-tenant distributed systems. The work also bridges and advances the intersection of coded computing, hypergraph theory, and parallel algorithmics, setting a universal baseline for future research on optimal allocations in distributed computing (2601.05873).

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 19 likes about this paper.