BR-DRAG: Byzantine-Resilient DRAG in FL

Updated 18 January 2026

The paper introduces a Byzantine-resilient variant of DRAG that leverages server-calibrated reference directions to correct client drift and counter adversarial attacks.
It employs a calibrated aggregation mechanism using a trusted root dataset and strict normalization to ensure consistent, robust updates from heterogeneous clients.
Empirical evaluations show that BR-DRAG maintains high test accuracy and convergence rates in challenging non-IID and Byzantine environments on benchmarks like CIFAR-10.

Byzantine-Resilient DRAG (BR-DRAG) is an algorithmic framework developed to address both statistical heterogeneity ("client drift") and Byzantine robustness in federated learning (FL). It enhances the DRAG ("Divergence-based Adaptive Aggregation") method by incorporating defense mechanisms based on vetted server-side reference updates derived from a trusted "root" dataset, thus enabling stable and convergence-guaranteed aggregation even under adversarial client behaviors and partial participation (Xiao et al., 11 Jan 2026, Zhu et al., 2023).

1. Foundations: Client Drift and Byzantine Threats in FL

Federated learning suffers from two principal obstacles: client drift due to heterogeneity of local data distributions, and vulnerability to Byzantine attacks, where a fraction of clients submit arbitrarily adversarial updates. Standard local SGD amplifies bias toward clients' local optima, impeding global convergence. Meanwhile, malicious clients can destabilize or corrupt global model updates through carefully crafted attacks, including magnitude-scaling and label-flipping, potentially causing divergence (Xiao et al., 11 Jan 2026, Zhu et al., 2023). BR-DRAG arises from the observation that both phenomena manifest as persistent misalignment between local update directions and the true gradient sought by the centralized objective.

2. DRAG Algorithmic Principles

At BR-DRAG's core is the DRAG algorithm. DRAG introduces a "reference direction" $\mathbf{r}^t$ —an exponential moving average of past global updates—representing consensus trajectory in parameter space. Each client's local update $\mathbf{g}_m^t = \theta_m^{t,U} - \theta^t$ is measured for misalignment using the divergence-of-degree (DoD) metric: $\lambda_m^t = c\left(1 - \frac{\langle \mathbf{g}_m^t, \mathbf{r}^t\rangle}{\|\mathbf{g}_m^t\|\|\mathbf{r}^t\|}\right)$ with $c\in[0,1]$ . Clients "calibrate" their reported update by linearly interpolating between the original direction and the reference: $\mathbf{v}_m^t = (1-\lambda_m^t)\,\mathbf{g}_m^t + \lambda_m^t\,\frac{\|\mathbf{g}_m^t\|}{\|\mathbf{r}^t\|}\,\mathbf{r}^t$ This process cures drift by penalizing angular deviation from the global reference, while preserving each update's norm. The server then aggregates these calibrated updates.

3. Byzantine-Resilient DRAG (BR-DRAG) Mechanism

BR-DRAG extends DRAG for Byzantine environments by refining reference computation and calibration strategies:

The server maintains a trusted root dataset $\mathcal{D}_{\mathrm{root}}$ , free from adversarial contamination.
At round $t$ , the server runs $U$ local SGD steps on $\mathcal{D}_{\mathrm{root}}$ to yield a certified reference direction:

$\mathbf{r}^t = \theta_{\mathrm{root}}^{t,U} - \theta^t$

All client updates are re-calibrated using this trusted $\mathbf{r}^t$ , applying a stricter normalization to nullify Byzantine magnitude manipulation:

$\mathbf{v}_m^t = (1-\lambda_m^t)\,\frac{\|\mathbf{r}^t\|}{\|\mathbf{g}_m^t\|}\,\mathbf{g}_m^t + \lambda_m^t\,\mathbf{r}^t$

ensuring $\|\mathbf{v}_m^t\| = \|\mathbf{r}^t\|$ .

This approach, inspired by FLTrust, maintains update consistency even when a significant fraction of clients are Byzantine, as only those updates well-aligned with $\mathbf{r}^t$ significantly influence the aggregate.

4. Theoretical Properties and Convergence

Under standard federated learning assumptions:

Each local objective $F_m$ is $L$ -smooth and bounded below;
All stochastic gradients are unbiased with bounded variance $\sigma_L^2$ (within-client) and $\sigma_G^2$ (across-clients);
Participation is partial ( $S<M$ clients per round) and data heterogeneity is present.

For non-convex objectives, DRAG and BR-DRAG provably achieve: $\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\|\nabla f(\theta^t)\|^2 \leq \frac{f(\theta^0) - f^*}{\gamma\eta U T} + V$ where the residual $V$ depends on the drag parameter $c$ and variance parameters but vanishes as $T$ grows, provided $\eta = O(1/U)$ and $U=O(1)$ . Thus, DRAG and BR-DRAG guarantee $O(1/\sqrt{T})$ convergence rates analogous to FedAvg, but in the presence of both heterogeneity and adversarial attacks (Xiao et al., 11 Jan 2026, Zhu et al., 2023).

5. Empirical Performance and Practical Considerations

Empirical studies on benchmarks (EMNIST, CIFAR-10, CIFAR-100) under strong non-IID regimes and with Byzantine attackers confirm:

Setting	DRAG Test Acc.	FedAvg	SCAFFOLD	FLTrust	BR-DRAG
CIFAR-10, 4 attackers	>75%	diverges	-	degrades	~as DRAG

Notably, with 4 Byzantine clients (random scaling attacks), BR-DRAG retains >75% test accuracy at all sample sizes, outperforming FLTrust as heterogeneity increases (Xiao et al., 11 Jan 2026, Zhu et al., 2023). DRAG achieves a 70.8% test accuracy after 600 rounds on CIFAR-10 with Dirichlet- $\beta=0.1$ , while FedAvg reaches only 52.3%. Larger sample sizes $S$ per round improve stability, but BR-DRAG remains effective even with $S \ll M$ .

Parameter tuning insights:

$\alpha \in [0.2, 0.3]$ balances reference stability and adaptiveness.
$c$ in $[0.1, 0.25]$ trades off drift correction and gradient diversity.
Overhead is negligible: per-round costs involve only an extra broadcast vector ( $\mathbf{r}^t$ ) and inner-product computations, matching FedAvg with $O(Sd)$ complexity.

6. Extensions and Limitations

BR-DRAG is compatible with secure aggregation and lossy compression schemes, owing to its simple vector operations. Adaptive drag parameter scheduling may enhance robustness. However, no formal theoretical guarantee is offered for convergence under general Byzantine behaviors (e.g., label-flipping, Krum/Trim attacks), although empirical evidence indicates resilience in diverse attack scenarios (Zhu et al., 2023). Hyperparameter cross-validation is recommended per dataset due to task sensitivity.

Potential extensions under consideration include convex and strongly-convex settings for tighter rate analysis, as well as adaptation to wireless FL with channel noise.

BR-DRAG generalizes adaptive aggregation strategies by leveraging a geometric alignment heuristic rather than client-side control variates (as in SCAFFOLD) or simple mean aggregation (as in FedAvg). In contrast to FLTrust, which also relies on a trusted server-side update, BR-DRAG's vector dragging retains benign diversity and more finely distinguishes honest but unaligned updates from Byzantine outliers, minimizing false positives. Overhead and communication cost are essentially identical to standard SGD-based FL protocols (Xiao et al., 11 Jan 2026, Zhu et al., 2023).

Markdown Report Issue Upgrade to Chat

References (2)

Divergence-Based Adaptive Aggregation for Byzantine Robust Federated Learning (2026)

DRAG: Divergence-based Adaptive Aggregation in Federated learning on Non-IID Data (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Byzantine-Resilient DRAG (BR-DRAG).