Federated Learning for Malware Detection in IoT Devices (2104.09994v3)

Published 15 Apr 2021 in cs.CR and cs.LG

Abstract: This work investigates the possibilities enabled by federated learning concerning IoT malware detection and studies security issues inherent to this new learning paradigm. In this context, a framework that uses federated learning to detect malware affecting IoT devices is presented. N-BaIoT, a dataset modeling network traffic of several real IoT devices while affected by malware, has been used to evaluate the proposed framework. Both supervised and unsupervised federated models (multi-layer perceptron and autoencoder) able to detect malware affecting seen and unseen IoT devices of N-BaIoT have been trained and evaluated. Furthermore, their performance has been compared to two traditional approaches. The first one lets each participant locally train a model using only its own data, while the second consists of making the participants share their data with a central entity in charge of training a global model. This comparison has shown that the use of more diverse and large data, as done in the federated and centralized methods, has a considerable positive impact on the model performance. Besides, the federated models, while preserving the participant's privacy, show similar results as the centralized ones. As an additional contribution and to measure the robustness of the federated approach, an adversarial setup with several malicious participants poisoning the federated model has been considered. The baseline model aggregation averaging step used in most federated learning algorithms appears highly vulnerable to different attacks, even with a single adversary. The performance of other model aggregation functions acting as countermeasures is thus evaluated under the same attack scenarios. These functions provide a significant improvement against malicious participants, but more efforts are still needed to make federated approaches robust.

Citations (216)

View on Semantic Scholar

Summary

The paper demonstrates that federated learning achieves high malware detection accuracy in IoT devices while preserving data privacy.
It employs supervised multi-layer perceptron and unsupervised autoencoder models on the N-BaIoT dataset to compare federated and centralized approaches.
The paper also evaluates vulnerabilities to data and model poisoning attacks, revealing limitations of standard model averaging in federated setups.

Federated Learning for Malware Detection in IoT Devices

The paper "Federated Learning for Malware Detection in IoT Devices" presents an in-depth investigation into applying Federated Learning (FL) for detecting malware in Internet of Things (IoT) devices. With the increasing deployment of IoT devices, the necessity for efficient malware detection has become paramount, especially when these devices often lack basic security mechanisms. The paper explores how FL can be leveraged to address these challenges while simultaneously preserving data privacy.

Overview and Methodology

The authors discuss the integration of FL as a means to facilitate collaborative machine learning without sharing sensitive IoT data among participants. Traditional centralized approaches to malware detection involve aggregating data at a central server, which presents substantial privacy concerns. By contrast, FL allows for the creation of a global machine learning model using decentralized data retained locally on devices, thus preserving privacy by design.

The paper employs the N-BaIoT dataset, which models network traffic from various IoT devices affected by malware, to validate the ML models developed through FL. The authors examine both supervised and unsupervised learning models—using multi-layer perceptrons and autoencoders, respectively—and compare their performance with traditional approaches. The experimental setup includes models developed through local data training, centralized data sharing, and federated approaches using FL.

Experimental Results

The numerical results in the paper highlight several key findings:

The federated approach demonstrated similar performance metrics to centralized models, achieving high accuracy while maintaining data privacy.
In identifying both known and unseen malware across IoT devices, the federated models showed robustness, comparable to centralized systems yet without the need for sharing sensitive data.
The paper further evaluates the robustness of FL under adversarial conditions. It explores the impact of various attacks, including data poisoning and model poisoning, and assesses the efficacy of different aggregation functions in mitigating these attacks. They found that the standard model averaging method in FL is vulnerable to malicious participants, even when the adversary is a single malicious entity.

Implications and Future Directions

From a theoretical perspective, the paper significantly contributes to understanding how FL can be effectively utilized for IoT device security in the face of evolving cybersecurity threats. Practically, the deployment of FL models in real-world scenarios could enhance IoT security frameworks while ensuring compliance with privacy regulations.

The authors propose several future directions to advance this research:

Extending the evaluation of FL models to different types of adversarial attacks, including more sophisticated and stealthier forms.
Exploring advanced aggregation functions and techniques that could offer stronger resilience against adversaries in federated setups.
The necessity for larger, more diverse datasets that reflect the variety of IoT environments to assess the scalability and robustness of FL in broader contexts.
Refining the FL framework to operate asynchronously and autonomously using decentralized ledger technologies like blockchain, which may improve resilience against single points of failure.

In conclusion, this research elucidates the utility and resilience of Federated Learning in deploying malware detection systems across IoT networks, delineating a path forward for secure, scalable, and privacy-preserving cybersecurity solutions in the increasingly interconnected digital landscape.

PDF Markdown