Building an Efficient Intrusion Detection System Based on Feature Selection and Ensemble Classifier (1904.01352v4)

Published 2 Apr 2019 in cs.CR and cs.LG

Abstract: Intrusion detection system (IDS) is one of extensively used techniques in a network topology to safeguard the integrity and availability of sensitive assets in the protected systems. Although many supervised and unsupervised learning approaches from the field of machine learning have been used to increase the efficacy of IDSs, it is still a problem for existing intrusion detection algorithms to achieve good performance. First, lots of redundant and irrelevant data in high-dimensional datasets interfere with the classification process of an IDS. Second, an individual classifier may not perform well in the detection of each type of attacks. Third, many models are built for stale datasets, making them less adaptable for novel attacks. Thus, we propose a new intrusion detection framework in this paper, and this framework is based on the feature selection and ensemble learning techniques. In the first step, a heuristic algorithm called CFS-BA is proposed for dimensionality reduction, which selects the optimal subset based on the correlation between features. Then, we introduce an ensemble approach that combines C4.5, Random Forest (RF), and Forest by Penalizing Attributes (Forest PA) algorithms. Finally, voting technique is used to combine the probability distributions of the base learners for attack recognition. The experimental results, using NSL-KDD, AWID, and CIC-IDS2017 datasets, reveal that the proposed CFS-BA-Ensemble method is able to exhibit better performance than other related and state of the art approaches under several metrics.

Citations (309)

View on Semantic Scholar

Summary

The paper introduces a two-phase IDS framework that integrates a novel CFS-BA feature selection method with an ensemble classifier for enhanced detection.
The ensemble classifier, combining C4.5, Random Forest, and Forest PA, achieved high accuracy (99.81%) and minimal false alarm rates (0.08%).
Empirical evaluations on NSL-KDD, AWID, and CIC-IDS2017 datasets confirm the model’s adaptability and relevance to modern network security challenges.

Overview of an Efficient Intrusion Detection System Based on Feature Selection and Ensemble Classifier

The paper "Building an Efficient Intrusion Detection System Based on Feature Selection and Ensemble Classifier" by Yuyang Zhou et al. delineates a comprehensive approach to enhance the efficacy of Intrusion Detection Systems (IDSs). This research presents an innovative framework that leverages feature selection and ensemble learning techniques to overcome the challenges of high-dimensional data and unbalanced network traffic in IDSs. The authors identify the primary issues with existing intrusion detection algorithms: redundancy in large datasets, the inefficiency of individual classifiers on varied attack types, and the poor adaptability of models to new attack vectors.

Methodology

The methodology proposed integrates a two-phase approach combining feature selection through a novel heuristic method called CFS-BA, and an ensemble classification mechanism. The CFS-BA algorithm is a combination of Correlation-based Feature Selection (CFS) and the Bat Algorithm (BA), which optimizes feature subsets by measuring correlation alongside dimensionality reduction. This step aims to reduce irrelevant data, which can potentially degrade the performance of intrusion detection systems.

For the classification stage, the paper employs an ensemble classifier that amalgamates the strengths of the C4.5, Random Forest (RF), and Forest by Penalizing Attributes (Forest PA) algorithms. By utilizing a voting technique based on the average of probabilities (AOP), this ensemble promises enhanced classification decisions. The model's proficiency is evaluated on the NSL-KDD, AWID, and CIC-IDS2017 datasets, showcasing marked improvements over existing models across various performance metrics.

Evaluation and Results

The proposed system exhibited superior classification accuracy and reduced false alarm rates when benchmarked against standalone models and other contemporary methods. On the NSL-KDD dataset, for instance, it achieved an accuracy of 99.81%, an ADR of 99.7%, and managed to keep the FAR at a minimal 0.08%. Such performance underscores the robustness of the ensemble framework in discriminating between benign and malicious traffic effectively.

The choice of datasets — NSL-KDD, AWID, and CIC-IDS2017 — each presenting modern, diversified attack scenarios and network environments, further validates the versatility and real-world applicability of the CFS-BA-Ensemble model. The model is particularly effective in environments where balanced high detection rates and low false alarms are critical, notably in detecting rare and sophisticated attacks.

Implications and Future Directions

The implications of this research are significant both at practical and theoretical levels. Practically, the proposed IDS can be integrated into network security architectures to bolster defenses against an evolving threat landscape, especially with emerging complex attack patterns. Theoretically, the paper introduces a reliable strategy for feature selection and ensemble learning that can be adapted and further refined for various data-intensive applications beyond cybersecurity.

The promising results pave the way for future research in the domain of adaptive security solutions, where the model could dynamically learn and evolve in response to novel threats. Further exploration into the efficiency of the Bat Algorithm in various settings and extending the ensemble methodology to include more diverse classifiers could yield even better performance. Additionally, integrating such models with real-time processing capabilities could revolutionize real-world applications of intrusion detection systems.

Overall, the paper provides a methodically sound and empirically validated contribution to the field of intrusion detection, with far-reaching potential for enhancing network security infrastructures.

PDF Markdown