Data Poisoning Attacks on Federated Machine Learning (2004.10020v1)

Published 19 Apr 2020 in cs.CR and cs.LG

Abstract: Federated machine learning which enables resource constrained node devices (e.g., mobile phones and IoT devices) to learn a shared model while keeping the training data local, can provide privacy, security and economic benefits by designing an effective communication protocol. However, the communication protocol amongst different nodes could be exploited by attackers to launch data poisoning attacks, which has been demonstrated as a big threat to most machine learning models. In this paper, we attempt to explore the vulnerability of federated machine learning. More specifically, we focus on attacking a federated multi-task learning framework, which is a federated learning framework via adopting a general multi-task learning framework to handle statistical challenges. We formulate the problem of computing optimal poisoning attacks on federated multi-task learning as a bilevel program that is adaptive to arbitrary choice of target nodes and source attacking nodes. Then we propose a novel systems-aware optimization method, ATTack on Federated Learning (AT2FL), which is efficiency to derive the implicit gradients for poisoned data, and further compute optimal attack strategies in the federated machine learning. Our work is an earlier study that considers issues of data poisoning attack for federated learning. To the end, experimental results on real-world datasets show that federated multi-task learning model is very sensitive to poisoning attacks, when the attackers either directly poison the target nodes or indirectly poison the related nodes by exploiting the communication protocol.

Authors (5)

Gan Sun (29 papers)
Yang Cong (33 papers)
Jiahua Dong (48 papers)
Qiang Wang (271 papers)
Ji Liu (285 papers)

Citations (172)

View on Semantic Scholar

Summary

Overview of "Data Poisoning Attacks on Federated Machine Learning"

The research paper titled "Data Poisoning Attacks on Federated Machine Learning" addresses a critical security issue within federated learning systems: data poisoning attacks. The authors systematically examine the vulnerabilities of federated machine learning frameworks, particularly focusing on the federated multi-task learning paradigm. In federated learning, data is decentralized and resides on local devices—such as mobile phones or IoT devices—while models are learned collectively through distributed nodes. This architectural setup endeavors to maintain privacy and security by keeping data local, however, it introduces new vectors for cyberattacks, notably through data poisoning.

Key Contributions

Bilevel Optimization Framework: The paper introduces a bilevel optimization approach to compute optimal poisoning strategies against federated learning systems. This formalization adapts to various configurations of target and source nodes for a sophisticated attack vector analysis.
AT $^2$ FL Algorithm: The authors propose a novel algorithm named ATTack on Federated Learning (AT $^2$ FL). The algorithm efficiently computes implicit gradients for poisoned data and formulates optimal attack strategies that exploit specific vulnerabilities in federated learning systems.
Empirical Evaluation: The paper conducts an extensive empirical evaluation using real-world datasets. The experiments demonstrate the sensitivity of federated multi-task learning models to poisoning attacks, substantiating claims on both direct and indirect contamination channels.

Experimental Insights

The experimental results illustrate that federated multi-task learning models are significantly susceptible to poisoning, which attackers can exploit by corrupting target nodes directly or indirectly. Notably, even when attackers do not have direct access to target nodes, they can compromise related nodes and leverage the federated learning communication protocol for indirect influence. The research proves that direct poisoning attacks tend to be more damaging compared to indirect ones, but indirect attacks still present considerable threats—especially when node relationships in the model are strongly correlated, thereby propagating poison through shared communication pathways.

Implications and Future Directions

The implications of this research extend into the realms of cybersecurity for distributed systems, with concrete applications in federated learning deployments—spanning mobile, IoT, and edge devices. By unveiling the routes and mechanisms through which federated systems can be compromised, this paper prompts immediate attention toward enhancing federated learning designs to mitigate poisoning risks.

It also opens several avenues for future work:

Development of robust defenses against data poisoning within federated settings.
Exploration of additional poisoning vectors beyond those considered.
Investigation into adaptive learning frameworks that can detect and counteract poisoned inputs dynamically.

In conclusion, this research presents a critical insight into federated machine learning security, diagnosing potential flaws and iterating on methodologies for proactive risk assessment and management.

PDF Markdown