Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks (1912.12716v2)

Published 29 Dec 2019 in cs.LG, cs.AI, cs.CR, and stat.ML

Abstract: This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks. To cope with such attacks, most resilient approaches so far combine stochastic gradient descent (SGD) with different robust aggregation rules. However, the sizeable SGD-induced stochastic gradient noise makes it challenging to distinguish malicious messages sent by the Byzantine attackers from noisy stochastic gradients sent by the 'honest' workers. This motivates us to reduce the variance of stochastic gradients as a means of robustifying SGD in the presence of Byzantine attacks. To this end, the present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks. Rather than the mean employed by distributed SAGA, the novel Byrd- SAGA relies on the geometric median to aggregate the corrected stochastic gradients sent by the workers. When less than half of the workers are Byzantine attackers, the robustness of geometric median to outliers enables Byrd-SAGA to attain provably linear convergence to a neighborhood of the optimal solution, with the asymptotic learning error determined by the number of Byzantine workers. Numerical tests corroborate the robustness to various Byzantine attacks, as well as the merits of Byrd- SAGA over Byzantine attack resilient distributed SGD.

Authors (4)

Zhaoxian Wu (11 papers)
Qing Ling (58 papers)
Tianyi Chen (139 papers)
Georgios B. Giannakis (182 papers)

Citations (167)

View on Semantic Scholar

Summary

The paper introduces Byrd-SAGA, a new Byzantine-resilient variant of SAGA that uses geometric median aggregation instead of traditional mean for robustness.
Theoretical analysis shows that reducing stochastic gradient variance improves the geometric median's robustness and that Byrd-SAGA achieves linear convergence with error bounded by Byzantine nodes.
Numerical tests demonstrate that Byrd-SAGA outperforms other methods under various Byzantine attacks due to its effective variance reduction and geometric median aggregation.

Overview of Federated Variance-Reduced Stochastic Gradient Descent Resilience to Byzantine Attacks

The paper under review introduces an innovative approach to distributed finite-sum optimization in federated learning systems subject to Byzantine attacks. These systems are increasingly significant for preserving privacy and decentralizing machine learning and data processing tasks across networks. However, they are vulnerable to malicious activities known as Byzantine attacks, where some participating devices deliberately send incorrect information to compromise the system. Addressing this vulnerability is crucial for secure and efficient distributed computing.

Contributions and Methodology

The authors propose a Byzantine attack resilient variant of the Stochastic Average Gradient Descent algorithm, termed Byrd-SAGA. This novel algorithm fuses variance reduction techniques characteristic of traditional SAGA with robust aggregation mechanisms to enhance resilience against Byzantine attacks. The critical innovation of Byrd-SAGA lies in its use of geometric median aggregation rather than traditional mean methods for combining corrected stochastic gradients from distributed nodes. The geometric median inherently provides robustness by weighting outliers less significantly, offering improved resistance to malicious data manipulations when less than half of the participating nodes are Byzantine attackers.

Theoretical Insights

A noteworthy aspect of the research is its theoretical contribution to understanding the interaction between stochastic gradient noise and robust aggregation. The authors establish that reducing the variance of stochastic gradients enhances the effectiveness of the geometric median as a robust aggregation tool against Byzantine attacks. They provide detailed mathematical analysis proving that Byrd-SAGA achieves linear convergence to a neighborhood of the optimal solution, with the asymptotic learning error determined by the number of Byzantine nodes. The bounds on convergence rates and learning errors are analytically derived, highlighting the algorithm's efficacy and robustness under adversarial conditions.

Numerical Analysis and Comparisons

The paper includes rigorous numerical tests comparing Byrd-SAGA's performance with alternative approaches such as Byzantine-tolerant SGD and mini-batch SGD under various types of Byzantine attacks, namely Gaussian attacks, sign-flipping attacks, and zero-gradient attacks. Across multiple datasets, Byrd-SAGA consistently demonstrates superior robustness and convergence, primarily due to effective stochastic gradient noise reduction and the robustness of the geometric median aggregation.

Implications and Future Research Directions

The implications of this paper are significant, particularly in the context of secure decentralized machine learning applications where data integrity and robustness are critical. The proposed Byrd-SAGA can be further explored in fully decentralized networks, where no central authority exists, thus extending its applicability. Future research could incorporate alternative robust aggregation techniques and variance reduction methods, broadening the framework's efficacy in diverse adversarial settings.

This paper provides insightful contributions to the field of secure distributed learning. By addressing one of the critical challenges in federated learning systems—robustness to Byzantine attacks—this work paves the way for more reliable and scalable decentralized machine learning solutions.