NF-CSE-CIC-IDS2018: NetFlow IDS Benchmark

Updated 5 January 2026

NF-CSE-CIC-IDS2018 is a standardized NetFlow dataset for machine learning-based intrusion detection, featuring binary and multiclass attack labeling.
It comprises 8.4 million flows with 12 NetFlow v9 features per flow, ensuring real-world deployability and rapid inference with sub-20 μs latency.
The dataset underpins research in IDS benchmarking, federated adversarial learning, and explainable frameworks through rigorous feature extraction and normalization.

NF-CSE-CIC-IDS2018 is a standardized NetFlow-formatted dataset derived from the original CSE-CIC-IDS2018 intrusion detection benchmark. It provides per-flow features extracted via nProbe NetFlow v9 and is designed for machine learning-based network intrusion detection systems (NIDS). The dataset prioritizes real-world deployability by using header-based fields, supports both binary and multiclass attack-labeling, and enables rapid, resource-constrained inference. Since its release, NF-CSE-CIC-IDS2018 has become a foundation for research in IDS benchmarking, feature selection, federated adversarial learning, and explainable NIDS frameworks.

1. Dataset Foundation and Feature Schema

NF-CSE-CIC-IDS2018 is constructed by exporting flows from the CSE-CIC-IDS2018 PCAPs via nProbe in NetFlow v9 format (Sarhan et al., 2020). Each row in the dataset represents a unidirectional network flow, keyed by the classic five-tuple: source IP, destination IP, source port, destination port, and protocol. Labeling aligns each flow to its original ground truth—0 for benign, 1 for attack—with the multiclass label reflecting specific families.

Standard NetFlow Feature Set

The dataset defines exactly 12 NetFlow v9 fields per flow:

Field	Type	Description
IPV4_SRC_ADDR	categorical	Source IP address (first packet)
IPV4_DST_ADDR	categorical	Destination IP address (first packet)
L4_SRC_PORT	integer	Transport-layer source port
L4_DST_PORT	integer	Transport-layer destination port
PROTOCOL	integer	IP protocol number (e.g., 6 for TCP)
TCP_FLAGS	integer	Bitwise OR of observed TCP flags
L7_PROTO	integer	Application-layer protocol code (nProbe DPI engine)
IN_PKTS	integer	Packets sent src→dst
OUT_PKTS	integer	Packets sent dst→src
IN_BYTES	integer	Bytes sent src→dst
OUT_BYTES	integer	Bytes sent dst→src
FLOW_DURATION_MS	integer	Duration (ms) from first to last packet

The ground truth for binary and multiclass attack labels is reflected in two extra columns per flow record.

2. Data Extraction, Processing, and Labeling Protocols

Export is performed with nProbe v7 in NetFlow v9 mode using explicit options ensuring CSV output, original packet timestamps, and direct mapping of each flow’s five-tuple to the ground-truth target (Sarhan et al., 2020). No duplicate-flow suppression is performed beyond NetFlow’s own timeouts. For downstream ML tasks, typical cleaning steps include dropping all identifier columns (IPs, ports, timestamps) to prevent trivial memorization and performing min–max normalization of continuous variables.

Labeling matches each flow’s five-tuple and protocol against attack or benign lists from the original scenario scripts; multiclass labels identify attacks such as BruteForce, Botnet, DDoS, DoS, Infiltration, and various Web exploits. Published CSV files consistently encode both binary and multiclass labels.

3. Statistical Properties and Attack Distribution

NF-CSE-CIC-IDS2018 comprises approximately 8.4 million flows (Sarhan et al., 2020). The binary class balance is approximately 88% benign and 12% attack. Multiclass splits reveal marked imbalance—BruteForce and DDoS dominate, while Botnet and Web attacks constitute less than 1% each. Strategies such as stratified sampling or explicit class-weighting are typically recommended to offset this imbalance in classifier training. Feature distributions for packet and byte counts, duration, and flags are strongly right-skewed, with several attack classes occupying very narrow sub-ranges.

Class	Flow Count	Proportion
Benign	7,373,198	87.86%
BruteForce	287,597	3.43%
Botnet	15,683	0.19%
DoS	269,361	3.21%
DDoS	380,096	4.53%
Infiltration	62,072	0.74%
Web Attacks	4,394	0.05%

4. Machine Learning Methodologies and Performance Benchmarks

The NetFlow feature set leads to competitive performance for binary intrusion detection using a variety of classic and modern machine learning models. Studies consistently demonstrate near parity between NetFlow-only models and more complex statistical sets for binary classification, though multi-class identification, notably of rare or payload-driven attacks, suffers a measurable performance drop (Sarhan et al., 2020, Sarhan et al., 2021, Atefinia et al., 2022).

Binary Detection Results

Extra Trees: Accuracy 95.3%, F1-score 0.83, Detection Rate 94.71%, False Alarm Rate 4.59%, AUC 0.9506 (Sarhan et al., 2020).
Random Forest: Accuracy 99.47%, F1-score 0.98, Recall 96.8%, FAR 0.17%, AUC 0.9833, prediction time 20.98 μs/flow (Sarhan et al., 2021).
Naive Bayes and single decision trees can achieve F1 = 1.00 in tuned Spark-MLlib settings for Botnet (Atefinia et al., 2022).

Multiclass and Feature Selection

Multi-class F1 scores for NetFlow models reach 0.80 (vs 0.94 for richer CICFlowMeter sets) (Sarhan et al., 2020). Feature selection studies show that binary accuracy plateaus quickly—RF achieves maximal AUC of 0.9512 with only the top six NetFlow features, and deep FF models reach near peak accuracy with three features (Sarhan et al., 2021). For practical deployments, minimal NetFlow-style records favor throughput and memory saving, with only incremental loss in discrimination.

5. Impact on Explainability, Efficiency, and Generalizability

Shapley Additive Explanations (SHAP) applied to NF-CSE-CIC-IDS2018 highlight packetization irregularity, burstiness, and inter-arrival timing as dominant feature contributors—forward segment size minimum, packet rate, and byte rate consistently rank highest (Sarhan et al., 2021). This supports streamlined model design, operational transparency, and aligns with field insights; just 3–6 features suffice for accurate detection in IoT or high-throughput scenarios (Sarhan et al., 2021).

NetFlow fields are header-derived and eschew deep packet inspection, making the dataset deployable at scale and on resource-constrained environments. Prediction latencies are sub-20 μs per flow, well suited for real-time applications. However, fine-grained multiclass attack identification is limited for certain stealth or content-based malicious flows; augmenting with specialized CICFlowMeter or content-derived features is sometimes warranted (Sarhan et al., 2020).

6. Usage in Federated, Distributed, and Explainable NIDS Research

NF-CSE-CIC-IDS2018 is widely adopted for federated learning robustness and big data benchmarking. The dataset supports adversarial FL experiments with non-IID partitioning, as in Hybrid Reputation Aggregation (HRA), where up to 96.6% global model accuracy was achieved under severe attack simulation, outperforming prior robust aggregation rules by 7.87 percentage points (Sheikhi et al., 22 Sep 2025). Distributed architectures using Spark MLlib show that wrapper-driven feature selection and parallelism yield perfect detection in memory-light classifiers with 2–4 feature subsets, while SVM and gradient boosting require more resources for marginal returns (Atefinia et al., 2022).

7. Limitations, Recommendations, and Evolving Best Practices

NetFlow records in NF-CSE-CIC-IDS2018 lack packet payload signatures and detailed interarrival histograms, constraining attack class separation when malicious traffic exhibits subtle protocol manipulations (Sarhan et al., 2020). Ensemble or hybrid models, stratified cross-validation, and periodic feature re-analysis are recommended for production IDS pipelines. For multiclass discrimination, fusing a select subset of original CICFlowMeter features or content descriptors can improve granularity (Sarhan et al., 2021).

Researchers are advised to:

Adopt standard NetFlow fields for production-grade NIDS.
Leverage minimal feature subsets via chi-square or SHAP analysis for resource saving.
Apply class weighting or downsampling to handle skewed distributions.
Document and periodically revisit model generalization performance, taking dataset heterogeneity into account (Cantone et al., 2024).

NF-CSE-CIC-IDS2018 thus serves as a rigorous, well-characterized, and practically deployable foundation for advanced machine learning research, IDS architecture benchmarking, feature selection, and federated learning resilience studies.