NetFlow Datasets for Machine Learning-based Network Intrusion Detection Systems (2011.09144v1)

Published 18 Nov 2020 in cs.NI

Abstract: Machine Learning (ML)-based Network Intrusion Detection Systems (NIDSs) have proven to become a reliable intelligence tool to protect networks against cyberattacks. Network data features has a great impact on the performances of ML-based NIDSs. However, evaluating ML models often are not reliable, as each ML-enabled NIDS is trained and validated using different data features that may do not contain security events. Therefore, a common ground feature set from multiple datasets is required to evaluate an ML model's detection accuracy and its ability to generalise across datasets. This paper presents NetFlow features from four benchmark NIDS datasets known as UNSW-NB15, BoT-IoT, ToN-IoT, and CSE-CIC-IDS2018 using their publicly available packet capture files. In a real-world scenario, NetFlow features are relatively easier to extract from network traffic compared to the complex features used in the original datasets, as they are usually extracted from packet headers. The generated Netflow datasets have been labelled for solving binary- and multiclass-based learning challenges. Preliminary results indicate that NetFlow features lead to similar binary-class results and lower multi-class classification results amongst the four datasets compared to their respective original features datasets. The NetFlow datasets are named NF-UNSW-NB15, NF-BoT-IoT, NF-ToN-IoT, NF-CSE-CIC-IDS2018 and NF-UQ-NIDS are published at http://staff.itee.uq.edu.au/marius/NIDS_datasets/ for research purposes.

View on arXiv

Authors (4)

Mohanad Sarhan (16 papers)
Siamak Layeghy (26 papers)
Nour Moustafa (23 papers)
Marius Portmann (46 papers)

Citations (172)

View on Semantic Scholar

Summary

Evaluation of NetFlow Datasets for Machine Learning-based Network Intrusion Detection Systems

The paper authored by Mohanad Sarhan, Siamak Layeghy, Nour Moustafa, and Marius Portmann addresses an important aspect in the field of network security: the formulation of standardized NetFlow datasets for machine learning-based Network Intrusion Detection Systems (NIDS). Network security practitioners and researchers have long grappled with the challenge of effectively training and evaluating NIDS due to the disparate feature sets across available benchmark datasets. This disparity impedes direct comparison and comprehensive evaluation capabilities across varying datasets.

Key Contributions

The authors deliver a significant contribution by transforming four prominent NIDS datasets—UNSW-NB15, BoT-IoT, ToN-IoT, and CSE-CIC-IDS2018—into a standardized NetFlow format. By isolating and extracting NetFlow features, these datasets, NF-UNSW-NB15, NF-BoT-IoT, NF-ToN-IoT, NF-CSE-CIC-IDS2018, and the aggregated NF-UQ-NIDS dataset, allow for a consistent set of features to be used in training machine learning algorithms.

In generating these NetFlow-based datasets, the paper scrutinizes not just the feasibility of using reduced feature sets compared to their original counterparts, but also the performance of such a conversion in terms of binary- and multi-class classification. This transformation is wholly pragmatic, focusing on maintaining enough features for reliable classification while reducing computational overhead typical in handling more elaborate original dataset features.

Numerical Results and Evaluation

Through their comprehensive evaluation, the paper presents preliminary results showing that the NetFlow datasets achieve competitive binary classification performance akin to their respective original datasets. For instance, NF-UNSW-NB15 and NF-ToN-IoT datasets demonstrate satisfactory detection rates and F1 scores, indicating that NetFlow features could provide a viable alternative for efficient NIDS training across multiple scenarios.

However, the multi-class classification performance on NF-ToN-IoT and NF-CSE-CIC-IDS2018 reveals gaps where certain attack types are not effectively detected. These findings underscore the reconciliatory balance between simplicity and comprehensiveness of the standardized feature set, encouraging further detailed analysis to incorporate critical elements from the original datasets that may bolster detection accuracy.

Practical Implications and Future Directions

Practically, the adoption of NetFlow features translates into a realistic approach, given their ease of extraction from existing network hardware. This could streamline the deployment process and reduce the cost of operation, given reduced data storage and collection resources. The utility of NetFlow extends its relevance in environments where rapid feature extraction is paramount to maintain operational network security.

The paper opens avenues for future research notably in refining the NetFlow feature set to enhance detection capabilities further. Identifying features from the original datasets that significantly contribute to accurate detection, specifically in complex multi-class classification scenarios, is pivotal. Such enhancements could deliver improvements in both the accuracy and efficiency of real-world NIDS implementations.

In conclusion, this paper provides a methodologically sound step towards harmonizing NIDS dataset feature sets, paving the way for more consistent evaluation protocols in machine learning models. As network security threats evolve, the continuous development of adaptable and efficient detection systems like NIDS remains critical, and the standardization of features is fundamental in transitions toward more robust models. The insights gathered here set the stage for further refinements and indicate a promising approach to advancing the field of network security.

PDF Markdown

Related Papers

Find Related Papers