Towards a Standard Feature Set for Network Intrusion Detection System Datasets (2101.11315v2)

Published 27 Jan 2021 in cs.NI

Abstract: Network Intrusion Detection Systems (NIDSs) are important tools for the protection of computer networks against increasingly frequent and sophisticated cyber attacks. Recently, a lot of research effort has been dedicated to the development of Machine Learning (ML) based NIDSs. As in any ML-based application, the availability of high-quality datasets is critical for the training and evaluation of ML-based NIDS. One of the key problems with the currently available datasets is the lack of a standard feature set. The use of a unique and proprietary set of features for each of the publicly available datasets makes it virtually impossible to compare the performance of ML-based traffic classifiers on different datasets, and hence to evaluate the ability of these systems to generalise across different network scenarios. To address that limitation, this paper proposes and evaluates standard NIDS feature sets based on the NetFlow network meta-data collection protocol and system. We evaluate and compare two NetFlow-based feature set variants, a version with 12 features, and another one with 43 features.

PDF Abstract

Overview of the Proposed Standard Feature Set for Network Intrusion Detection Systems

The document titled "Towards a Standard Feature Set for Network Intrusion Detection System Datasets" addresses a critical challenge in the evaluation and development of ML based Network Intrusion Detection Systems (NIDSs): the lack of a standardized feature set in existing datasets. The authors, Sarhan, Layeghy, and Portmann, propose a solution involving the adoption of a NetFlow-based standard feature set designed to enhance the comparability, generalizability, and practical deployment of ML-NIDS models.

Challenges with Current NIDS Datasets

The paper provides a thorough review of existing NIDS datasets, underlining their limitations due to unique and proprietary feature sets. This heterogeneity complicates consistent performance evaluation across different datasets and constrains a model’s capacity to generalize to varied network scenarios. The current approach hinders meaningful comparisons due to datasets having disparate feature sets, despite originating from similar network behaviors or attacks.

Proposal of a NetFlow-based Feature Set

To address these limitations, the paper proposes a standardized feature set based on the NetFlow protocol, an industry-standard for network traffic collection. The authors argue that adopting this protocol simplifies feature extraction due to its wide support in network devices and efficient data representation. They present a standardized set of 43 NetFlow features designed to encapsulate critical security events necessary for effective network intrusion detection.

Evaluation and Comparative Analysis

The new feature set was evaluated by applying it to four widely recognized NIDS datasets: UNSW-NB15, BoT-IoT, ToN-IoT, and CSE-CIC-IDS2018. Each dataset was converted into a variant compatible with the proposed feature set, enabling a fair evaluation of ML models across different datasets. Using an Extra Tree classifier, the paper demonstrates that the 43-feature variant consistently achieves better classification performance compared to the proprietary feature sets of each original dataset.

For evaluation, both binary-class and multi-class classification scenarios were examined. In binary classification, the extended feature set enabled notably higher Area Under the Curve (AUC) values, indicating improved detection accuracy. Multi-class evaluations reaffirmed these findings, highlighting enhanced F1 scores across diverse attack classes, facilitated by the broader set of extracted features that better capture the variance in network traffic flows.

Implications and Future Research

The introduction of a standardized NetFlow feature set holds significant implications. Practically, it allows for more consistent methodologies in benchmarking ML-based NIDSs through uniform datasets, boosting the reliability and comparison of results. Theoretically, this standardization may bridge the current gap between academic research and real-world deployment by ensuring more robust and transferable ML models. Furthermore, the ability to merge datasets compiled under this common feature ensures researchers have access to comprehensive training sources that encapsulate a wide variety of network behaviors and attacks.

Looking forward, future research could explore additional feature selection processes to refine the NetFlow set further or investigate its adaptation for emerging network environments, such as IoT-driven topologies. Additionally, while the proposed feature set has demonstrated improved detection capabilities, exploring its performance across novel attack scenarios will be crucial to maintaining its effectiveness.

Conclusion

This paper presents a compelling case for the establishment of a standardized NetFlow-based feature set for NIDS datasets, demonstrating its advantages over existing proprietary features in terms of both consistency and classification performance. By proposing a solution that addresses both the theoretical challenges and practical constraints facing ML-based NIDSs, the research contributes valuable insights toward enhancing the efficacy and deployment of intrusion detection systems in diverse network environments.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Mohanad Sarhan (16 papers)
Siamak Layeghy (26 papers)
Marius Portmann (46 papers)

Citations (168)

View on Semantic Scholar