Gradient Coding with Cyclic MDS Codes

Updated 8 February 2026

The paper introduces an optimal gradient coding scheme that uses cyclic MDS codes to ensure exact recovery from n–s worker responses while mitigating stragglers.
It leverages the cyclic structure and MDS properties to construct a coding matrix with minimal storage overhead (d = s+1) and reduced decoding complexity.
The approach also extends to approximate gradient coding via expander graphs, offering improved computational efficiency and robust statistical guarantees.

Gradient coding with cyclic MDS codes is a method for mitigating stragglers in distributed machine learning by leveraging structures from classical coding theory. This approach provides optimal exact recovery schemes using cyclic Maximum Distance Separable (MDS) codes and also enables approximate gradient coding using expander graphs. These constructions optimize both storage overhead and decoding complexity while offering rigorous guarantees for exact and approximate gradient recovery in the presence of straggling worker nodes (Raviv et al., 2017).

1. Gradient Coding Problem and Exact Reconstruction Condition

Consider a distributed learning scenario with a master node $M$ and worker nodes $W_1,\ldots,W_n$ , where a dataset $S$ of size $m$ is partitioned into $n$ disjoint batches $S_1 \cup \cdots \cup S_n$ . In each iteration, $M$ seeks the full gradient:

$\nabla L_S(w) = \frac{1}{m} \sum_{z \in S} \nabla \ell(w, z)$

Each worker $W_i$ stores $d$ of the $S_j$ and computes a single linear combination $u_i = \sum_{j \in \operatorname{supp}(b_i)} b_{i,j}\nabla L_{S_j}(w)$ over the local batches, returning $u_i$ to $M$ . For up to $s$ stragglers, $M$ must exactly reconstruct the full gradient using any $n-s$ worker responses.

Exact recovery is characterized by the existence, for any subset $K \subseteq \{1, ..., n\}$ of $|K| \geq n-s$ non-stragglers, of a vector $a(K)$ supported on $K$ such that $a(K) \cdot B = (1,1,\ldots,1)$ , where $B \in F^{n \times n}$ is the matrix of coding coefficients and $F$ is the underlying field.

2. Construction of Exact Schemes Using Cyclic MDS Codes

Cyclic $[n, n-s]$ MDS codes containing the all-ones vector facilitate deterministic, optimal, and exact gradient coding. Let $C \subset F^n$ denote such a code. The scheme constructs a codeword $c^1$ of support $\{1, ..., s+1\}$ and forms the gradient coding matrix $B$ by aligning $n$ cyclic shifts $c^1, ..., c^n$ as columns:

$B = [ (c^1)^\top | (c^2)^\top | \cdots | (c^n)^\top ] \in F^{n \times n}$

Each row of $B$ has Hamming weight $s+1$ , and, by the cyclic and MDS properties, any $n-s$ rows of $B$ are linearly independent. This ensures that the master node can reconstruct the full gradient from any subset of $n-s$ non-straggler worker results.

The storage overhead $d=s+1$ is proven optimal by the information-theoretic lower bound $d \geq s+1$ .

2.1. Complex-Field Construction: Reed-Solomon Codes

Let $F = \mathbb{C}$ and $\alpha_j = \exp(2\pi i j / n)$ for $j=0,\ldots,n-1$ . The [n, n-s] Reed-Solomon code defined as

$C = \{ (f(\alpha_0), ..., f(\alpha_{n-1})) : \deg f < n-s \}$

is cyclic and contains the all-ones vector. The generator matrix is Vandermonde:

$G = [\alpha_j^k]_{0 \leq k < n-s, 0 \leq j < n}$

2.2. Real-Field Construction: BCH Codes

For the real case, if $n \not\equiv s \pmod{2}$ , construct a real cyclic BCH code of length $n$ and dimension $n-s$ by taking $s$ consecutive roots of unity. This code contains the all-ones vector, allowing the same column shift construction as for the Reed-Solomon code.

3. Decoding Algorithms and Complexity Analysis

Given non-straggler indices $K$ of size $n-s$ , decoding requires finding $a(K)$ supported on $K$ solving $a(K)B=1$ . For the complex-field Reed-Solomon construction, leverage GRS code duality:

Precompute an $x'\in C^\perp$ so that $x'B=1$ .
For arbitrary $K$ , $C^\perp$ is also GRS; interpolate a degree $<s$ polynomial over $s$ points ( $O(s \log^2 s)$ ) and evaluate it at $n$ roots of unity using FFT ( $O(n\log n)$ ).

This yields per-iteration decoding complexity $O(s \log^2 s + n \log n)$ , outperforming previous methods that required $O(n^2)$ or $O((n-s)\log^2(n-s))$ operations for $s=o(n)$ . Encoding costs are $O(s(n-s))$ arithmetic operations per column, compared to $O(n^2\log^2 n)$ for prior art.

Scheme	Storage Overhead ( $d$ )	Decoding Cost
Cyclic MDS (this work)	$s+1$ (optimal)	$O(s \log^2 s + n\log n)$
ShortDot (algebraic)	$s+1$	$O((n-s)\log^2(n-s))$
Randomized (Tandon et al.)	$O(s \log n)$	Higher (not optimal)

4. Comparative Evaluation and Theoretical Guarantees

Tandon et al. introduced randomized schemes with $d=O(s\log n)$ . The cyclic-MDS construction achieves the minimum possible $d=s+1$ deterministically, for all $(n, s)$ , and with lower encoding and decoding complexity when $s=o(n)$ . ShortDot and similar algebraic code constructions also attain $d=s+1$ but either require $n$ divisible by $s+1$ or incur higher decoding costs. The cyclic MDS approach imposes no divisibility restrictions and minimizes arithmetic per iteration.

The cyclic-MDS method satisfies the key optimality theorem: for any $K$ of size $n-s$ , there exists a unique reconstruction vector $a(K)$ supported on $K$ with $a(K)B=1$ . Duality properties of the cyclic $[n, n-s]$ MDS code ensure this characterization.

5. Approximate Gradient Coding via Expander Graphs

When relaxation to approximate recovery is permissible, one can reduce storage overhead below $s+1$ by encoding with the normalized adjacency matrix $B = (1/d) A_G$ of a $d$ -regular expander graph $G$ .

For $K$ of size $n-s$ non-stragglers, set $a(K) = 1 + u_K$ where $u_K$ compensates for missing responses. Spectral bounds yield:

$\|a(K)B - 1\|_2 \leq \frac{\lambda}{d} \sqrt{\frac{n s}{n-s}}$

where $\lambda$ is the second-largest eigenvalue of $A_G$ . For Ramanujan expanders, $\lambda \approx 2\sqrt{d-1}$ , so the approximation error decreases with increasing $d$ .

Statistically, for random stragglers, the expected value $E[v] \propto \nabla L_S(w)$ and the variance is controlled by $(\lambda/d)^2$ . This approach yields faster convergence rates compared to simply ignoring stragglers, and empirical results show negligible generalization error increase while significantly reducing computation per worker.

6. Storage, Bandwidth, and Lower Bounds

Each worker stores $d=s+1$ batches and communicates one coded linear combination per iteration. For the complex-field scheme, two real coordinates can be packed into one complex number, and the full gradient can be unpacked by $O(p)$ operations at the master. This renders the scheme bandwidth optimal over $\mathbb{R}$ .

An information-theoretic lower bound asserts that for exact recovery with $d$ batches per worker, $d \geq s+1$ . For $d < s+1$ , there always exists at least one set of $s$ stragglers rendering exact recovery impossible, and any approximate error must satisfy

$\min_{a \text{ supported on } K} \|aB - 1\|_2 \geq \sqrt{\lfloor s/d \rfloor}$

7. Convergence and Statistical Remarks

For random straggling (each worker fails independently with probability $1-q$), expectation and variance of the aggregate returned gradient satisfy $E[v]=c \cdot \nabla L_S(w)$ and

$\operatorname{Var}[v]=O\left(n\left((1-q)^n + \frac{(\lambda/d)^2 (1-q)}{q}\right)\right)$

In standard SGD with $\beta$ -smooth objective functions, expected error decays as $O(\frac{1}{\sqrt{t}}\sqrt{\operatorname{Var}[v]})$ . The exact cyclic-MDS schemes achieve zero-variance; expander-based approximate schemes benefit from a substantially reduced variance bonus $(\lambda/d) < 1$ compared to naive schemes.

In summary, cyclic MDS codes yield deterministic, structurally simple, and provably optimal exact gradient coding with minimal storage and computation. Expander graph-based approximate gradient codes offer graceful degradation and improved statistical guarantees with lower storage requirements, both of which advance the scalability and robustness of distributed learning (Raviv et al., 2017).

Markdown Upgrade to Chat

References (1)

Gradient Coding from Cyclic MDS Codes and Expander Graphs (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Gradient Coding with Cyclic MDS Codes.