Papers
Topics
Authors
Recent
2000 character limit reached

Comparative Elimination Filter

Updated 14 January 2026
  • Comparative Elimination Filter is a robust aggregation method that discards the highest norm or furthest-distance reports to mitigate Byzantine faults in distributed and federated learning.
  • It operates by sorting gradients, eliminating the top f reports, and averaging the remaining ones, ensuring efficiency under bounded variance and redundancy conditions.
  • Empirical evaluations show that the CE filter achieves near-linear convergence with minimal overhead, outperforming several alternative robust aggregation algorithms.

The Comparative Elimination (CE) filter is a norm-based robust aggregation mechanism designed for Byzantine fault tolerance in distributed machine learning systems. Its principal function is to eliminate the influence of agents that may send arbitrary, potentially adversarial information—so-called Byzantine agents—by discarding the largest-norm (or furthest distance) reports before model update. The CE filter has been shown to provide provable resilience to a bounded fraction of Byzantine faults under standard stochastic and strong convexity or Polyak–Łojasiewicz (PL) conditions, both in convex and nonconvex optimization settings. Its computational simplicity and minimal distributional assumptions distinguish it within the robust aggregation literature (Gupta et al., 2020, Dutta et al., 2023).

1. Byzantine Fault Model in Distributed and Federated Learning

Distributed Stochastic Gradient Descent (D-SGD) and federated optimization frameworks comprise nn or NN agents and a centralized, trusted server or coordinator. Each agent independently conducts stochastic optimization, sampling data points from an unknown common distribution per round. In the fault-tolerant setting, up to ff agents may be Byzantine: able to collude, deviate arbitrarily from the prescribed algorithm, or inject malicious vectors or models. The remaining honest agents sample data i.i.d. and compute stochastic gradients or local models. Typical goal statements require finding a minimizer of the aggregate objective—either the expected loss Q(w)=EzD[(w,z)]Q(w) = \mathbb{E}_{z\sim D}[\ell(w, z)] for D-SGD, or the minimizer of iHqi(x)\sum_{i\in H} q^i(x) for local/federated optimization, where HH is the set of honest agents (Gupta et al., 2020, Dutta et al., 2023).

2. Comparative Elimination Filter: Algorithmic Formulation

The CE filter operates by discarding the largest-norm reports at each aggregation round. In D-SGD, the server receives gradient vectors {git}\{g_i^t\}, computes their Euclidean norms, sorts, and retains only the nfn-f smallest-norm gradients: S  =  {i:ri is among the smallest nf}S \;=\; \{ i : r_i \text{ is among the smallest } n-f \} The update step computes the mean of the surviving gradients: wt+1=wtηt  1nfiSgitw^{t+1} = w^t - \eta_t \;\frac1{n-f} \sum_{i \in S} g_i^t In federated/local SGD settings, the CE filter discards the ff local models {xk,Ti}\{x^i_{k,T}\} with largest 2\ell_2 distance from the global model xˉk\bar x_k. The server update is then: xˉk+1=1NfiFkxk,Ti\bar x_{k+1} = \frac1{N-f} \sum_{i \in \mathcal{F}_k} x^i_{k,T} where Fk\mathcal{F}_k indexes the NfN-f closest agents. The filter implicitly requires redundancy conditions (e.g., N>2fN>2f and "2-f redundancy") for robust convergence (Dutta et al., 2023).

Pseudocode: D-SGD CE Filter

1
2
3
4
Input: current iterate w^t, claimed gradients {g_i^t}_{i=1}^n, elimination budget f
Step 1 (Receive): Each agent i sends g_i^t. 
Step 2 (CGE filter): Compute norms r_i = ∥g_i^t∥₂, sort and keep smallest n−f.
Step 3 (Update): w^{t+1} ← w^t − η_t · (mean of kept gradients)
[Federated/local SGD pseudocode specifications are identical modulo input type and aggregation semantics.]

3. Fault Tolerance Conditions and Theoretical Guarantees

CE filter guarantees fault tolerance against a fraction ff of Byzantine agents under classical stochastic assumptions. Required conditions include bounded variance (EgitQ(wt)2σ2\mathbb{E}\|g_i^t-\nabla Q(w^t)\|^2\le\sigma^2), Lipschitz gradients, and strong convexity or PL inequality: qH(x)22μ(qH(x)qH(x))\|\nabla q^\mathcal{H}(x)\|^2 \ge 2\mu (q^\mathcal{H}(x) - q^\mathcal{H}(x^*)) Fault-tolerance margin is defined for D-SGD as: fn<λ2λ+μ\frac{f}{n} < \frac{\lambda}{2\lambda+\mu} In federated optimization, "2-f redundancy" ensures that the minimizer remains unchanged after discarding any ff reports, provided N>2fN>2f. The linear convergence rate holds in both convex and PL nonconvex regimes up to a steady-state error dependent on σ2\sigma^2, η\eta, and ff: Ewtw2ρtw0w2+1ρt1ρM2\mathbb{E}\|w^t - w^*\|^2 \le \rho^t \|w^0 - w^*\|^2 + \frac{1-\rho^t}{1-\rho}M^2 with explicit expressions for the contraction factor ρ\rho and bias M2M^2 (Gupta et al., 2020, Dutta et al., 2023). Under the stochastic PL–rate theorem, federated/local SGD with CE filter converges linearly to the honest optimum plus variance-induced bias.

4. Comparative Analysis: CE Filter and Alternative Robust Aggregators

The following table contrasts CE filter properties with other prominent Byzantine-robust aggregation algorithms:

Filter Complexity Fault Tolerance Constraints
Multi-KRUM O(n(n+d))\mathcal{O}(n(n+d)) f<n/2f < n/2
Geometric Median-of-Means O(n(n+d))\mathcal{O}(n(n+d)) f<b/2f < b/2 per block
Spectral methods (SEVER) O(ndmin{n,d})\mathcal{O}(n d \min\{n,d\}) f<n/4f < n/4
CWTM, signSGD O(nd)\mathcal{O}(n d) Distributional assumptions (unimodal/symmetric)
Comparative Elimination O(n(d+logn))\mathcal{O}(n(d+\log n)) f/n<λ/(2λ+μ)f/n < \lambda/(2\lambda+\mu), N>2fN>2f

The CE filter requires only norm computations and sorting per round, dispensing with geometric medians, blockwise operations, or spectral computations. It robustly tolerates Byzantine fractions up to theoretical margins set by strong convexity/PL parameters and the redundancy of honest cost functions.

5. Hyper-Parameter Selection and Operating Conditions

CE filter operation depends critically on accurate hyper-parameter selection:

  • Step size (ηt\eta_t, αk\alpha_k): Fixed values in (0,η)(0,\overline\eta) ensure linear convergence, or a diminishing schedule (ηt1/t\eta_t\propto 1/t) for standard SGD regimes. For PL nonconvex settings, αμ/(72L2T)\alpha \leq \mu/(72L^2T).
  • Batch size (kk): Larger batches decrease variance σ2\sigma^2, shrinking neighborhood bias; typical values are k=32k=32–$256$.
  • Elimination budget (ff): Server must estimate an upper bound on Byzantine agents; set CE filter to discard exactly ff per round.
  • Exponential averaging: Optional, with update hit=βhit1+(1β)gith_i^t = \beta h_i^{t-1} + (1-\beta)g_i^t (β[0.4,0.8]\beta \in [0.4,0.8]), lowers variance and stabilizes convergence at minimal computational cost.
  • Redundancy constraints: For federated settings, N>2fN > 2f is required.

This configuration ensures the CE filter remains statistically efficient and algorithmically robust under adversarial conditions (Gupta et al., 2020, Dutta et al., 2023).

6. Empirical Performance and Evaluation

Empirical evaluations on neural networks (MNIST + LeNet, d4.3×105d\approx 4.3\times10^5) and synthetic regression/classification tasks highlight the CE filter's practical impact. Benchmarked against GeoMed, MoM, coordinate-wise trimmed mean, and Multi-KRUM, CE achieves test accuracy within 1–3% of the best robust method under several Byzantine faults (gradient-reverse, label-flip, “norm-confusing”). Representative per-iteration times: CE \approx0.57 s, outperforming GeoMed (2.47 s), MoM (1.15 s), Multi-KRUM (2.22 s), CWTM (0.90 s). With exponential averaging (β0.6\beta \approx 0.6), CE approaches theoretical robustness limits (Gupta et al., 2020).

In nonconvex PL regression and classification settings, simulations with N=50N=50 and f{2,5,8,10}f\in\{2,5,8,10\} show CE outperforms Multi-KRUM and CWTM in proximity to honest optimum. Increased local steps (T=3T=3) further accelerate convergence (Dutta et al., 2023).

7. Summary and Practical Implications

The Comparative Elimination filter robustifies distributed and federated SGD against Byzantine faults by eliminating the top-ff largest-norm reports per round. Its salient features include:

  • Fault tolerance under bounded variance and convexity/PL growth conditions
  • Linear convergence to a small neighborhood of the global optimum
  • Minimal computational overhead, requiring only sorting and averaging
  • Proven empirical efficacy under diverse fault models

A plausible implication is that CE filter methodology can be generally applied wherever per-round norm or distance statistics are easily computed and strong redundancy is present among honest agents, particularly in communication-limited and large-scale federated learning deployments (Gupta et al., 2020, Dutta et al., 2023).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Comparative Elimination (CE) Filter.