Universal and Asymptotically Optimal Data and Task Allocation in Distributed Computing

Published 9 Jan 2026 in cs.IT | (2601.05873v1)

Abstract: We study the joint minimization of communication and computation costs in distributed computing, where a master node coordinates $N$ workers to evaluate a function over a library of $n$ files. Assuming that the function is decomposed into an arbitrary subfunction set $\mathbf{X}$, with each subfunction depending on $d$ input files, renders our distributed computing problem into a $d$-uniform hypergraph edge partitioning problem wherein the edge set (subfunction set), defined by $d$-wise dependencies between vertices (files) must be partitioned across $N$ disjoint groups (workers). The aim is to design a file and subfunction allocation, corresponding to a partition of $\mathbf{X}$, that minimizes the communication cost $π{\mathbf{X}}$, representing the maximum number of distinct files per server, while also minimizing the computation cost $δ{\mathbf{X}}$ corresponding to a maximal worker subfunction load. For a broad range of parameters, we propose a deterministic allocation solution, the \emph{Interweaved-Cliques (IC) design}, whose information-theoretic-inspired interweaved clique structure simultaneously achieves order-optimal communication and computation costs, for a large class of decompositions $\mathbf{X}$. This optimality is derived from our achievability and converse bounds, which reveal -- under reasonable assumptions on the density of $\mathbf{X}$ -- that the optimal scaling of the communication cost takes the form $n/N^{1/d}$, revealing that our design achieves the order-optimal \textit{partitioning gain} that scales as $N^{1/d}$, while also achieving an order-optimal computation cost. Interestingly, this order optimality is achieved in a deterministic manner, and very importantly, it is achieved blindly from $\mathbf{X}$, therefore enabling multiple desired functions to be computed without reshuffling files.

Abstract PDF Chat (Pro)

Summary

The paper presents a universal interweaved clique (IC) design that achieves order-optimal, deterministic data and task allocation to minimize communication and computation loads.
It models allocation as a d-uniform hypergraph edge partitioning problem, achieving a communication scaling of O(n/N^(1/d)) and constant computation cost bounds.
The method supports scalable distributed computing by enabling efficient task assignment without file reshuffling, ideal for diverse multi-tenant scenarios.

Universal and Asymptotically Optimal Data and Task Allocation in Distributed Computing

Problem Formulation and Theoretical Framework

The paper addresses the joint minimization of communication and computation costs in a generic master–worker distributed computing framework. A master node with access to a dataset of $n$ files coordinates $N$ workers to perform function computations. Each function of interest can be decomposed into a set $\mathbf{X}$ of subfunctions, each depending on exactly $d$ files. This abstraction naturally reformulates the allocation of subfunctions and files as a $d$ -uniform hypergraph edge partitioning problem: files are vertices and subfunctions are edges, and the partitioning must assign subfunctions to workers while balancing communication and computational load.

The core objectives are:

Communication cost: Minimize $\pi_{\mathbf{X}} = \max_{b} |\mathcal{W}^{(b)}|$ , i.e., the largest number of files sent to any worker.
Computation cost: Minimize $\delta_{\mathbf{X}} = \max_b |\mathbf{\Phi}_b| / \lceil |\mathbf{X}|/N\rceil$ , i.e., maximal overload relative to perfect task balance.

The formulation is universal, encompassing many practical scenarios such as covariance computation, kernel methods, contrastive loss evaluation, scientific simulation, and multi-way bioinformatic comparisons, each characterized by $d$ -wise data dependencies.

Relationship to Hypergraph Edge Partitioning and Prior Work

The edge partitioning of $d$ -uniform hypergraphs to minimize communication and computational loads is a classic, yet still computationally intractable, problem—NP-hard even for $d=2$ . Existing algorithmic partitions for graphs (e.g., ARF-minimizing methods, projective plane–based constructions) provide $O(\sqrt{N})$ performance for $d=2$ , but extensions to general $d$ and order-optimal, provable scaling laws have been lacking.

Notably, the paper’s communication cost metric, $\pi_{\mathbf{X}}$ , directly quantifies the maximum load any communication link must sustain, which, unlike the average replication factor (ARF), remains operationally meaningful even when computational delay constraints are absent.

The Interweaved Clique (IC) Design: Construction and Properties

The main contribution is an explicit, deterministic, interweaved clique (IC)–based allocation construction. The design is universal: for any $(n, d, N)$ and any subfunction set $\mathbf{X} \subseteq \mathbf{A}_{n,d}$ of non-trivial density, the file allocation is fixed independently of $\mathbf{X}$ , supporting simultaneous or sequential computation of many functions without data reshuffling.

Construction Outline:

Partition files into $k$ families, $k$ chosen so that $N' = \binom{k}{d}$ is maximal with $N' \leq N$ .
Assign each worker a group of tasks (subfunctions) whose data support aligns with a unique combination of families—a “clique.” This exploits structure common in coded caching and coded computation.
For general $N$ , the assignment is refined to fill $N$ groups by lexicographical splitting and redistribution.

Communication Cost Bound: For $\mathbf{X}$ of density $\varphi$ , the IC design achieves

$\pi_{\mathbf{X}} \leq \frac{4e\, n}{N^{1/d}}.$

The order-optimal partitioning gain is thus $N^{1/d}$ .

Computation Cost Bound: For $d \leq n/32$ , $N \leq (\frac{9}{10} \sqrt{n/d})^d$ , and $\varphi \gtrsim \log n/n^{d/2}$ , with probability at least $1 - 1/n$, it holds that

$\delta_{\mathbf{X}} \leq 5.$

The design also achieves the lower bound for any decomposition:

$\pi_{\mathbf{X}}^\star \geq \frac{\varphi^{1/d} n}{N^{1/d}}.$

Algorithmic Salience:

Deterministic, low-complexity construction
File allocation is fixed (independent of $\mathbf{X}$ )—tasks are assigned according to subset intersections
Communication-optimal assignment without file reshuffling for different decompositions over the same library

Main Results: Bounds, Optimality, and Universality

The authors prove tight, matching achievability and converse results characterizing the minimal communication and computation costs for all sufficiently dense $\mathbf{X}$ . Explicit scaling laws include:

Quantity	Achievable Scaling (IC design)	Lower Bound	Regime/note
Communication cost ( $\pi$ )	$O\left(\frac{n}{N^{1/d}}\right)$	$\Omega\left(\frac{n}{N^{1/d}}\right)$	Any fixed $d$ , $\varphi$
Computation cost ( $\delta$ )	$O(1)$ (specifically $\leq 5$ )	$1$	High probability

The IC design is shown to be order-optimal for all sufficiently large $n,N$ and fixed $d,\varphi>0$ ; no partition can achieve fundamentally better communication scaling, even if the function is decomposed in the most favorable way.
The design is fully universal: optimal scaling is attained for all decompositions of any function (including those with variable $d$ or multiple admissible decompositions).
In the case of dense, clique-like $\mathbf{X}$ (a union of disjoint cliques), the minimal communication cost is achieved exactly.

These results apply directly to a large class of data analytics and ML problems, including but not limited to large-batch all-pairs/similarity computations, distributed kernel machines, and divide-and-conquer gradient workflows.

Comparison to Existing Algorithms

The ARF performance of the IC design matches or improves upon the best theoretical algorithmic guarantees available for graphs, and is broadly applicable to higher-order $d$ :

IC design ( $d=2$ ): ARF $< \sqrt{2N}$
Projective plane constructions: ARF $\in [1.5 \sqrt{N}, 2\sqrt{N}]$ (for special $N$ )
ARF guarantees from leading algorithms \cite{Dynamic,TrillionEdges}: IC design performs better for typical dense-task settings or for $N \lesssim n$ .
For $d>2$ , the IC design provides the first explicit construction with $N^{1/d}$ scaling.

Practical and Theoretical Implications

From a practical perspective, the IC design allows for data placement to be planned “once,” with robust guarantees for arbitrary workloads over a fixed dataset, enabling high communication efficiency for multi-tenant analytics pipelines and federated or ensemble ML distributed across commodity clusters. The asynchronous operation and avoidance of file reshuffling are particularly valuable for elastic, serverless, or heterogenous compute environments.

From a theoretical perspective, the manuscript closes a key gap in the distributed computation and hypergraph partitioning literature: the explicit, deterministic, and tight partitioning gain for general $d$ , and with explicit achievability, is now characterized for all practical parameter ranges where the task set is not vanishingly sparse.

Conclusion

This work establishes universality and order-optimality for the joint data and task allocation problem in distributed computing, via an interweaved clique construction inspired by combinatorial and information-theoretic methods. The partitioning scheme provides worst-case optimal communication cost and constant-factor computation balancing for generic functions expressible via dense multi-way dependencies. Its design simplicity, lack of dependence on specific task sets, and tight scaling bounds make it directly deployable in large-scale, dynamic, heterogeneous, and multi-tenant distributed systems. The work also bridges and advances the intersection of coded computing, hypergraph theory, and parallel algorithmics, setting a universal baseline for future research on optimal allocations in distributed computing (2601.05873).