Approximate Gradient Coding with Expander Graphs

Updated 8 February 2026

The paper introduces an approximate gradient coding mechanism that tolerates stragglers by accepting a controlled approximation error in gradient reconstruction.
It employs expander graphs to structure data assignment, leveraging their spectral gap properties to achieve sublinear error scaling and provable convergence.
The scheme offers favorable trade-offs in computation, storage, and communication complexity, ensuring robust performance under both random and adversarial failures.

Approximate gradient coding using expander graphs is a straggler-mitigation technique in distributed learning that enables robust and efficient gradient aggregation when workers fail or are delayed. Unlike exact gradient coding, approximate schemes accept a controlled approximation error in the reconstructed gradient, trading off strict accuracy for reductions in storage, computation, and communication. Expander graphs provide the combinatorial structure underlying the most efficient known approximate gradient codes, offering both sublinear error scaling and provable convergence guarantees in both stochastic and adversarial settings (Raviv et al., 2017, Glasgow et al., 2020).

1. Approximate Gradient Coding: Problem Statement

Let a dataset $S$ of size $m$ be partitioned into $n$ blocks $S_1, \ldots, S_n$ . $n$ workers are available, each storing $d$ blocks (often determined by an assignment matrix $B\in\mathbb{R}^{n \times n}$ or a variant thereof). In each iteration $r$ , the master node broadcasts the model $w^{(r)}\in \mathbb{R}^p$ . Worker $i$ computes the partial gradients $g_{i,j} = \nabla L_{S_j}(w^{(r)})$ for its stored blocks, forms a linear combination $y_i = \frac{1}{n} \sum_{j=1}^{n} b_{i,j} g_{i,j}$ , and returns $y_i$ to the master.

Upon receiving responses from a subset $K\subseteq[n]$ (with $|K|=n-s$ non-straggling workers), the master reconstructs an approximate aggregate gradient:

$\hat g = \sum_{i\in K} a_i(K) y_i$

with a decoding map $a(K)\in\mathbb{R}^n$ , $\operatorname{supp}(a(K))\subseteq K$ . The scheme is $\epsilon$ -approximate if

$\|a(K) B - \mathbf{1}^T\|_2 \leq \epsilon(s)$

where $\mathbf{1}\in\mathbb{R}^n$ is the all-ones vector. This ensures the error in reconstructing the true (full) batch gradient is controllable and quantifiable (Raviv et al., 2017).

2. Expander Graph–Based Coding Constructions

Expander graphs are sparse $d$ -regular graphs $G=(V,E)$ on $n$ vertices with strong connectivity properties quantified by the spectral gap $\lambda = d - \max\{|\lambda_2|, |\lambda_n|\}$ , where the $\lambda_i$ denote the eigenvalues of the adjacency matrix $A_G\in\{0,1\}^{n\times n}$ .

Node–Task Assignment

Each worker corresponds to a vertex.
Each worker $i$ stores exactly $d$ blocks (adjacent vertices), so the replication factor is $d$ .
The coding matrix is $B = \frac{1}{d}A_G \in \mathbb{R}^{n \times n}$ : each row has $d$ nonzero entries ($1/d$).

Edge–Machine Assignment (Extended Variant)

In alternative expander-based schemes (Glasgow et al., 2020), machines correspond to edges ( $m = nd/2$ machines), each storing the two blocks associated with their incident vertices. The data-assignment matrix $A \in \mathbb{R}^{n \times m}$ is the normalized vertex–edge incidence matrix, with $A_{u,j} = A_{v,j} = 1/d$ for each edge $e_j = \{u,v\}$ .

3. Decoding and Error Analysis

Decoding Procedure

For the node assignment, a correction vector $u \in \mathbb{R}^n$ is constructed based on the responding set $K$ :

$u_i = \begin{cases} \frac{s}{n-s}, & i\in K \ -1, & i\notin K \end{cases}$

and $a(K) = \mathbf{1} + u$ .

In the edge-assignment setting, given surviving machines $S$ , the optimal decoding vector $w^* \in \mathbb{R}^{|S|}$ is the unique least-squares solution:

$w^* = (A_S^T A_S)^{-1} A_S^T \mathbf{1}$

with per-block coefficients $\alpha = A_S w^*$ , yielding an unbiased projection of $\mathbf{1}$ onto $\operatorname{im}(A_S)$ (Glasgow et al., 2020).

Error Bounds

In the node-assignment scheme, the error is bounded via spectral arguments:

$\|\hat{g} - \nabla L_S(w)\|_2 \leq \frac{\lambda}{d} \sqrt{\frac{ns}{n-s}}\, \|N(w)\|_2$

where $N(w)$ is the $n \times p$ matrix of partial gradients (Raviv et al., 2017).

For the edge-assignment scheme with optimal decoding under random straggler failures (each machine fails with probability $p$ ), the expected squared error satisfies

$\mathbb{E}\|\alpha - \mathbf{1}\|_2^2 \leq n \cdot p^{d-o(d)}$

and thus

$\mathbb{E}\|\alpha - \mathbf{1}\|_2 / \sqrt{n} = O(p^{d/2})$

This quantifies an exponentially decaying error in the replication factor $d$ (Glasgow et al., 2020). Under adversarial straggler patterns (up to $r$ failures), the worst-case covariance satisfies $\|\mathrm{Cov}(\alpha)\|_2 \leq 2 k^2 t^2 + 24$ for $k=O(d \log n),\, t=O(p^{\Theta(d)})$ , yielding a nearly two-fold improvement over fractional-repetition codes.

In both coding models, the trivial scheme ( $d=1$ ) yields an error factor $\sqrt{ns/(n-s)}$ but no spectral improvement, and exact gradient coding (e.g., via cyclic MDS codes) requires $d=s+1$ , potentially incurring high overhead (Raviv et al., 2017).

4. Computation, Storage, and Communication Complexity

Expander-graph based approximate gradient coding achieves a favorable trade-off:

Worker computation: Each worker computes $d$ partial gradients, costing $d \cdot C_\mathrm{grad}$ ( $C_\mathrm{grad}$ being the cost per partial gradient), and forms an $O(dp)$ -multiply linear combination.
Communication: Each worker sends a single vector $y_i\in\mathbb{R}^p$ to the master.
Storage overhead: Each block is replicated $d$ times, and each worker holds $d$ blocks (or two in the edge-assignment model).

Compared to exact gradient coding with cyclic MDS codes (requiring $d=s+1$ ), expander-based coding allows for any $d\ll n$ , typically held constant, offering low storage and computational overhead. The error–overhead trade-off is governed by the expander's spectral properties and the chosen $d$ : increasing $d$ increases redundancy but reduces error (as $\lambda/d$ becomes smaller, Ramanujan graphs offer $\lambda\leq 2\sqrt{d-1}$ ) (Raviv et al., 2017).

5. Convergence Guarantees

Under standard convexity and smoothness assumptions:

Unbiasedness: The reconstructed gradient is unbiased up to a scaling factor $c\approx 1-(1-q)^n$ ( $q$ being the worker response probability).
Variance: The variance parameter

$\sigma^2 \leq n\|N(w)\|_2^2\left[(1-q)^n + \frac{\lambda^2}{d^2} \frac{2(1-q)}{q}\right]$

SGD convergence: Using a step size $\eta=O(1/\sqrt{t})$ , suboptimality is $O(1/\sqrt{t})$ , but with a variance constant smaller by a factor $(\lambda^2/d^2)<1$ relative to the trivial scheme (Raviv et al., 2017).

For the edge-assignment scheme:

Random stragglers: SGD with approximate gradient $\alpha$ exhibits linear convergence up to a noise floor $O(r\sigma^2/\mu)$ , with $r=O(p^d)$ and $s=O(\log^2 n p^{2d})$ ; see [(Glasgow et al., 2020), Prop 5.1].
Adversarial stragglers: Under up to $r$ adversarial failures, the noise floor is $(1+\epsilon)r\sigma/(a\mu)$ , nearly half that of prior codes using optimal decoding (Glasgow et al., 2020).

6. Trade-Offs, Graceful Degradation, and Implementation Considerations

Expander-based approximate gradient coding provides a smooth trade-off between error and overhead:

Graceful degradation: The error bound $\frac{\lambda}{d} \sqrt{ns/(n-s)}$ degrades smoothly with the number of stragglers $s$ ; no parameter re-tuning is necessary.
Parameter selection: Larger $d$ improves error but increases resource usage. Ramanujan expanders enable $\lambda/d \approx 2/\sqrt{d}$ , facilitating tuning.
Implementation: Requires generation/sharing of $d$ -regular expander graphs. Data assignment is explicit: worker $i$ stores $S_j$ iff $A_{G_{i,j}}=1$ . Decoding involves a low-complexity $O(n)$ procedure. Random regular graphs can be selected, checking numerically whether $\lambda$ is sufficiently small (Raviv et al., 2017).

In empirical evaluations (Amazon EC2), the generalization error of expander-based approximate gradient coding closely matches that of full-gradient schemes while significantly reducing worker computation (Raviv et al., 2017).

7. Analytical and Graph-Theoretic Foundations

The performance of expander-based coding leverages key combinatorial and spectral properties:

Expander Mixing Lemma: Ensures uniformity of block–worker assignment by bounding edge counts between node subsets.
Random-percolation analysis: Demonstrates the existence of a giant component and bounded small components after straggler-induced failures, allowing the least-squares decoder to restrict error to small subgraphs (Glasgow et al., 2020).
Spectral analysis: The coding error is controlled directly by the spectral gap of the expander, relating $\lambda/d$ to fundamental error terms.
Least-squares projection properties: The optimal decoding vector yields unbiasedness and error minimization via Euclidean projection in the gradient estimation setting.

These analytical tools underpin both the design and theoretical guarantees for approximate gradient coding strategies using expander graphs, facilitating high-performance distributed learning robust to both random and adversarial stragglers (Raviv et al., 2017, Glasgow et al., 2020).

Markdown Upgrade to Chat

References (2)

Gradient Coding from Cyclic MDS Codes and Expander Graphs (2017)

Approximate Gradient Coding with Optimal Decoding (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Approximate Gradient Coding Using Expander Graphs.