Papers
Topics
Authors
Recent
Search
2000 character limit reached

Approximate Gradient Coding with Expander Graphs

Updated 8 February 2026
  • The paper introduces an approximate gradient coding mechanism that tolerates stragglers by accepting a controlled approximation error in gradient reconstruction.
  • It employs expander graphs to structure data assignment, leveraging their spectral gap properties to achieve sublinear error scaling and provable convergence.
  • The scheme offers favorable trade-offs in computation, storage, and communication complexity, ensuring robust performance under both random and adversarial failures.

Approximate gradient coding using expander graphs is a straggler-mitigation technique in distributed learning that enables robust and efficient gradient aggregation when workers fail or are delayed. Unlike exact gradient coding, approximate schemes accept a controlled approximation error in the reconstructed gradient, trading off strict accuracy for reductions in storage, computation, and communication. Expander graphs provide the combinatorial structure underlying the most efficient known approximate gradient codes, offering both sublinear error scaling and provable convergence guarantees in both stochastic and adversarial settings (Raviv et al., 2017, Glasgow et al., 2020).

1. Approximate Gradient Coding: Problem Statement

Let a dataset SS of size mm be partitioned into nn blocks S1,,SnS_1, \ldots, S_n. nn workers are available, each storing dd blocks (often determined by an assignment matrix BRn×nB\in\mathbb{R}^{n \times n} or a variant thereof). In each iteration rr, the master node broadcasts the model w(r)Rpw^{(r)}\in \mathbb{R}^p. Worker ii computes the partial gradients gi,j=LSj(w(r))g_{i,j} = \nabla L_{S_j}(w^{(r)}) for its stored blocks, forms a linear combination yi=1nj=1nbi,jgi,jy_i = \frac{1}{n} \sum_{j=1}^{n} b_{i,j} g_{i,j}, and returns yiy_i to the master.

Upon receiving responses from a subset K[n]K\subseteq[n] (with K=ns|K|=n-s non-straggling workers), the master reconstructs an approximate aggregate gradient:

g^=iKai(K)yi\hat g = \sum_{i\in K} a_i(K) y_i

with a decoding map a(K)Rna(K)\in\mathbb{R}^n, supp(a(K))K\operatorname{supp}(a(K))\subseteq K. The scheme is ϵ\epsilon-approximate if

a(K)B1T2ϵ(s)\|a(K) B - \mathbf{1}^T\|_2 \leq \epsilon(s)

where 1Rn\mathbf{1}\in\mathbb{R}^n is the all-ones vector. This ensures the error in reconstructing the true (full) batch gradient is controllable and quantifiable (Raviv et al., 2017).

2. Expander Graph–Based Coding Constructions

Expander graphs are sparse dd-regular graphs G=(V,E)G=(V,E) on nn vertices with strong connectivity properties quantified by the spectral gap λ=dmax{λ2,λn}\lambda = d - \max\{|\lambda_2|, |\lambda_n|\}, where the λi\lambda_i denote the eigenvalues of the adjacency matrix AG{0,1}n×nA_G\in\{0,1\}^{n\times n}.

Node–Task Assignment

  • Each worker corresponds to a vertex.
  • Each worker ii stores exactly dd blocks (adjacent vertices), so the replication factor is dd.
  • The coding matrix is B=1dAGRn×nB = \frac{1}{d}A_G \in \mathbb{R}^{n \times n}: each row has dd nonzero entries ($1/d$).

Edge–Machine Assignment (Extended Variant)

In alternative expander-based schemes (Glasgow et al., 2020), machines correspond to edges (m=nd/2m = nd/2 machines), each storing the two blocks associated with their incident vertices. The data-assignment matrix ARn×mA \in \mathbb{R}^{n \times m} is the normalized vertex–edge incidence matrix, with Au,j=Av,j=1/dA_{u,j} = A_{v,j} = 1/d for each edge ej={u,v}e_j = \{u,v\}.

3. Decoding and Error Analysis

Decoding Procedure

For the node assignment, a correction vector uRnu \in \mathbb{R}^n is constructed based on the responding set KK:

ui={sns,iK 1,iKu_i = \begin{cases} \frac{s}{n-s}, & i\in K \ -1, & i\notin K \end{cases}

and a(K)=1+ua(K) = \mathbf{1} + u.

In the edge-assignment setting, given surviving machines SS, the optimal decoding vector wRSw^* \in \mathbb{R}^{|S|} is the unique least-squares solution:

w=(ASTAS)1AST1w^* = (A_S^T A_S)^{-1} A_S^T \mathbf{1}

with per-block coefficients α=ASw\alpha = A_S w^*, yielding an unbiased projection of 1\mathbf{1} onto im(AS)\operatorname{im}(A_S) (Glasgow et al., 2020).

Error Bounds

In the node-assignment scheme, the error is bounded via spectral arguments:

g^LS(w)2λdnsnsN(w)2\|\hat{g} - \nabla L_S(w)\|_2 \leq \frac{\lambda}{d} \sqrt{\frac{ns}{n-s}}\, \|N(w)\|_2

where N(w)N(w) is the n×pn \times p matrix of partial gradients (Raviv et al., 2017).

For the edge-assignment scheme with optimal decoding under random straggler failures (each machine fails with probability pp), the expected squared error satisfies

Eα122npdo(d)\mathbb{E}\|\alpha - \mathbf{1}\|_2^2 \leq n \cdot p^{d-o(d)}

and thus

Eα12/n=O(pd/2)\mathbb{E}\|\alpha - \mathbf{1}\|_2 / \sqrt{n} = O(p^{d/2})

This quantifies an exponentially decaying error in the replication factor dd (Glasgow et al., 2020). Under adversarial straggler patterns (up to rr failures), the worst-case covariance satisfies Cov(α)22k2t2+24\|\mathrm{Cov}(\alpha)\|_2 \leq 2 k^2 t^2 + 24 for k=O(dlogn),t=O(pΘ(d))k=O(d \log n),\, t=O(p^{\Theta(d)}), yielding a nearly two-fold improvement over fractional-repetition codes.

In both coding models, the trivial scheme (d=1d=1) yields an error factor ns/(ns)\sqrt{ns/(n-s)} but no spectral improvement, and exact gradient coding (e.g., via cyclic MDS codes) requires d=s+1d=s+1, potentially incurring high overhead (Raviv et al., 2017).

4. Computation, Storage, and Communication Complexity

Expander-graph based approximate gradient coding achieves a favorable trade-off:

  • Worker computation: Each worker computes dd partial gradients, costing dCgradd \cdot C_\mathrm{grad} (CgradC_\mathrm{grad} being the cost per partial gradient), and forms an O(dp)O(dp)-multiply linear combination.
  • Communication: Each worker sends a single vector yiRpy_i\in\mathbb{R}^p to the master.
  • Storage overhead: Each block is replicated dd times, and each worker holds dd blocks (or two in the edge-assignment model).

Compared to exact gradient coding with cyclic MDS codes (requiring d=s+1d=s+1), expander-based coding allows for any dnd\ll n, typically held constant, offering low storage and computational overhead. The error–overhead trade-off is governed by the expander's spectral properties and the chosen dd: increasing dd increases redundancy but reduces error (as λ/d\lambda/d becomes smaller, Ramanujan graphs offer λ2d1\lambda\leq 2\sqrt{d-1}) (Raviv et al., 2017).

5. Convergence Guarantees

Under standard convexity and smoothness assumptions:

  • Unbiasedness: The reconstructed gradient is unbiased up to a scaling factor c1(1q)nc\approx 1-(1-q)^n (qq being the worker response probability).
  • Variance: The variance parameter

σ2nN(w)22[(1q)n+λ2d22(1q)q]\sigma^2 \leq n\|N(w)\|_2^2\left[(1-q)^n + \frac{\lambda^2}{d^2} \frac{2(1-q)}{q}\right]

  • SGD convergence: Using a step size η=O(1/t)\eta=O(1/\sqrt{t}), suboptimality is O(1/t)O(1/\sqrt{t}), but with a variance constant smaller by a factor (λ2/d2)<1(\lambda^2/d^2)<1 relative to the trivial scheme (Raviv et al., 2017).

For the edge-assignment scheme:

  • Random stragglers: SGD with approximate gradient α\alpha exhibits linear convergence up to a noise floor O(rσ2/μ)O(r\sigma^2/\mu), with r=O(pd)r=O(p^d) and s=O(log2np2d)s=O(\log^2 n p^{2d}); see [(Glasgow et al., 2020), Prop 5.1].
  • Adversarial stragglers: Under up to rr adversarial failures, the noise floor is (1+ϵ)rσ/(aμ)(1+\epsilon)r\sigma/(a\mu), nearly half that of prior codes using optimal decoding (Glasgow et al., 2020).

6. Trade-Offs, Graceful Degradation, and Implementation Considerations

Expander-based approximate gradient coding provides a smooth trade-off between error and overhead:

  • Graceful degradation: The error bound λdns/(ns)\frac{\lambda}{d} \sqrt{ns/(n-s)} degrades smoothly with the number of stragglers ss; no parameter re-tuning is necessary.
  • Parameter selection: Larger dd improves error but increases resource usage. Ramanujan expanders enable λ/d2/d\lambda/d \approx 2/\sqrt{d}, facilitating tuning.
  • Implementation: Requires generation/sharing of dd-regular expander graphs. Data assignment is explicit: worker ii stores SjS_j iff AGi,j=1A_{G_{i,j}}=1. Decoding involves a low-complexity O(n)O(n) procedure. Random regular graphs can be selected, checking numerically whether λ\lambda is sufficiently small (Raviv et al., 2017).

In empirical evaluations (Amazon EC2), the generalization error of expander-based approximate gradient coding closely matches that of full-gradient schemes while significantly reducing worker computation (Raviv et al., 2017).

7. Analytical and Graph-Theoretic Foundations

The performance of expander-based coding leverages key combinatorial and spectral properties:

  • Expander Mixing Lemma: Ensures uniformity of block–worker assignment by bounding edge counts between node subsets.
  • Random-percolation analysis: Demonstrates the existence of a giant component and bounded small components after straggler-induced failures, allowing the least-squares decoder to restrict error to small subgraphs (Glasgow et al., 2020).
  • Spectral analysis: The coding error is controlled directly by the spectral gap of the expander, relating λ/d\lambda/d to fundamental error terms.
  • Least-squares projection properties: The optimal decoding vector yields unbiasedness and error minimization via Euclidean projection in the gradient estimation setting.

These analytical tools underpin both the design and theoretical guarantees for approximate gradient coding strategies using expander graphs, facilitating high-performance distributed learning robust to both random and adversarial stragglers (Raviv et al., 2017, Glasgow et al., 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Approximate Gradient Coding Using Expander Graphs.