NF-CSE-CIC-IDS2018: NetFlow IDS Benchmark
- NF-CSE-CIC-IDS2018 is a standardized NetFlow dataset for machine learning-based intrusion detection, featuring binary and multiclass attack labeling.
- It comprises 8.4 million flows with 12 NetFlow v9 features per flow, ensuring real-world deployability and rapid inference with sub-20 μs latency.
- The dataset underpins research in IDS benchmarking, federated adversarial learning, and explainable frameworks through rigorous feature extraction and normalization.
NF-CSE-CIC-IDS2018 is a standardized NetFlow-formatted dataset derived from the original CSE-CIC-IDS2018 intrusion detection benchmark. It provides per-flow features extracted via nProbe NetFlow v9 and is designed for machine learning-based network intrusion detection systems (NIDS). The dataset prioritizes real-world deployability by using header-based fields, supports both binary and multiclass attack-labeling, and enables rapid, resource-constrained inference. Since its release, NF-CSE-CIC-IDS2018 has become a foundation for research in IDS benchmarking, feature selection, federated adversarial learning, and explainable NIDS frameworks.
1. Dataset Foundation and Feature Schema
NF-CSE-CIC-IDS2018 is constructed by exporting flows from the CSE-CIC-IDS2018 PCAPs via nProbe in NetFlow v9 format (Sarhan et al., 2020). Each row in the dataset represents a unidirectional network flow, keyed by the classic five-tuple: source IP, destination IP, source port, destination port, and protocol. Labeling aligns each flow to its original ground truth—0 for benign, 1 for attack—with the multiclass label reflecting specific families.
Standard NetFlow Feature Set
The dataset defines exactly 12 NetFlow v9 fields per flow:
| Field | Type | Description |
|---|---|---|
| IPV4_SRC_ADDR | categorical | Source IP address (first packet) |
| IPV4_DST_ADDR | categorical | Destination IP address (first packet) |
| L4_SRC_PORT | integer | Transport-layer source port |
| L4_DST_PORT | integer | Transport-layer destination port |
| PROTOCOL | integer | IP protocol number (e.g., 6 for TCP) |
| TCP_FLAGS | integer | Bitwise OR of observed TCP flags |
| L7_PROTO | integer | Application-layer protocol code (nProbe DPI engine) |
| IN_PKTS | integer | Packets sent src→dst |
| OUT_PKTS | integer | Packets sent dst→src |
| IN_BYTES | integer | Bytes sent src→dst |
| OUT_BYTES | integer | Bytes sent dst→src |
| FLOW_DURATION_MS | integer | Duration (ms) from first to last packet |
The ground truth for binary and multiclass attack labels is reflected in two extra columns per flow record.
2. Data Extraction, Processing, and Labeling Protocols
Export is performed with nProbe v7 in NetFlow v9 mode using explicit options ensuring CSV output, original packet timestamps, and direct mapping of each flow’s five-tuple to the ground-truth target (Sarhan et al., 2020). No duplicate-flow suppression is performed beyond NetFlow’s own timeouts. For downstream ML tasks, typical cleaning steps include dropping all identifier columns (IPs, ports, timestamps) to prevent trivial memorization and performing min–max normalization of continuous variables.
Labeling matches each flow’s five-tuple and protocol against attack or benign lists from the original scenario scripts; multiclass labels identify attacks such as BruteForce, Botnet, DDoS, DoS, Infiltration, and various Web exploits. Published CSV files consistently encode both binary and multiclass labels.
3. Statistical Properties and Attack Distribution
NF-CSE-CIC-IDS2018 comprises approximately 8.4 million flows (Sarhan et al., 2020). The binary class balance is approximately 88% benign and 12% attack. Multiclass splits reveal marked imbalance—BruteForce and DDoS dominate, while Botnet and Web attacks constitute less than 1% each. Strategies such as stratified sampling or explicit class-weighting are typically recommended to offset this imbalance in classifier training. Feature distributions for packet and byte counts, duration, and flags are strongly right-skewed, with several attack classes occupying very narrow sub-ranges.
| Class | Flow Count | Proportion |
|---|---|---|
| Benign | 7,373,198 | 87.86% |
| BruteForce | 287,597 | 3.43% |
| Botnet | 15,683 | 0.19% |
| DoS | 269,361 | 3.21% |
| DDoS | 380,096 | 4.53% |
| Infiltration | 62,072 | 0.74% |
| Web Attacks | 4,394 | 0.05% |
4. Machine Learning Methodologies and Performance Benchmarks
The NetFlow feature set leads to competitive performance for binary intrusion detection using a variety of classic and modern machine learning models. Studies consistently demonstrate near parity between NetFlow-only models and more complex statistical sets for binary classification, though multi-class identification, notably of rare or payload-driven attacks, suffers a measurable performance drop (Sarhan et al., 2020, Sarhan et al., 2021, Atefinia et al., 2022).
Binary Detection Results
- Extra Trees: Accuracy 95.3%, F1-score 0.83, Detection Rate 94.71%, False Alarm Rate 4.59%, AUC 0.9506 (Sarhan et al., 2020).
- Random Forest: Accuracy 99.47%, F1-score 0.98, Recall 96.8%, FAR 0.17%, AUC 0.9833, prediction time 20.98 μs/flow (Sarhan et al., 2021).
- Naive Bayes and single decision trees can achieve F1 = 1.00 in tuned Spark-MLlib settings for Botnet (Atefinia et al., 2022).
Multiclass and Feature Selection
Multi-class F1 scores for NetFlow models reach 0.80 (vs 0.94 for richer CICFlowMeter sets) (Sarhan et al., 2020). Feature selection studies show that binary accuracy plateaus quickly—RF achieves maximal AUC of 0.9512 with only the top six NetFlow features, and deep FF models reach near peak accuracy with three features (Sarhan et al., 2021). For practical deployments, minimal NetFlow-style records favor throughput and memory saving, with only incremental loss in discrimination.
5. Impact on Explainability, Efficiency, and Generalizability
Shapley Additive Explanations (SHAP) applied to NF-CSE-CIC-IDS2018 highlight packetization irregularity, burstiness, and inter-arrival timing as dominant feature contributors—forward segment size minimum, packet rate, and byte rate consistently rank highest (Sarhan et al., 2021). This supports streamlined model design, operational transparency, and aligns with field insights; just 3–6 features suffice for accurate detection in IoT or high-throughput scenarios (Sarhan et al., 2021).
NetFlow fields are header-derived and eschew deep packet inspection, making the dataset deployable at scale and on resource-constrained environments. Prediction latencies are sub-20 μs per flow, well suited for real-time applications. However, fine-grained multiclass attack identification is limited for certain stealth or content-based malicious flows; augmenting with specialized CICFlowMeter or content-derived features is sometimes warranted (Sarhan et al., 2020).
6. Usage in Federated, Distributed, and Explainable NIDS Research
NF-CSE-CIC-IDS2018 is widely adopted for federated learning robustness and big data benchmarking. The dataset supports adversarial FL experiments with non-IID partitioning, as in Hybrid Reputation Aggregation (HRA), where up to 96.6% global model accuracy was achieved under severe attack simulation, outperforming prior robust aggregation rules by 7.87 percentage points (Sheikhi et al., 22 Sep 2025). Distributed architectures using Spark MLlib show that wrapper-driven feature selection and parallelism yield perfect detection in memory-light classifiers with 2–4 feature subsets, while SVM and gradient boosting require more resources for marginal returns (Atefinia et al., 2022).
7. Limitations, Recommendations, and Evolving Best Practices
NetFlow records in NF-CSE-CIC-IDS2018 lack packet payload signatures and detailed interarrival histograms, constraining attack class separation when malicious traffic exhibits subtle protocol manipulations (Sarhan et al., 2020). Ensemble or hybrid models, stratified cross-validation, and periodic feature re-analysis are recommended for production IDS pipelines. For multiclass discrimination, fusing a select subset of original CICFlowMeter features or content descriptors can improve granularity (Sarhan et al., 2021).
Researchers are advised to:
- Adopt standard NetFlow fields for production-grade NIDS.
- Leverage minimal feature subsets via chi-square or SHAP analysis for resource saving.
- Apply class weighting or downsampling to handle skewed distributions.
- Document and periodically revisit model generalization performance, taking dataset heterogeneity into account (Cantone et al., 2024).
NF-CSE-CIC-IDS2018 thus serves as a rigorous, well-characterized, and practically deployable foundation for advanced machine learning research, IDS architecture benchmarking, feature selection, and federated learning resilience studies.