Edge-IIoTset Dataset Overview

Updated 24 November 2025

Edge-IIoTset is a comprehensive labeled network security dataset that benchmarks intrusion detection in industrial IoT and edge computing environments.
It is collected from a realistic seven-layer IIoT smart factory testbed, capturing diverse protocols, benign traffic, and a wide range of cyberattacks.
The dataset supports evaluation of both traditional and deep learning methods, addressing challenges like class imbalance and detailed feature preprocessing.

Edge-IIoTset is a comprehensive labeled network security dataset designed for benchmarking intrusion detection methods in Industrial Internet of Things (IIoT) and edge-computing environments. Generated from a realistic, multi-layer industrial testbed, it is constructed to reflect the diversity, heterogeneity, and operational complexity of contemporary IIoT deployments. Edge-IIoTset captures not only benign supervisory and telemetry traffic but also a wide spectrum of network attacks, enabling the development, validation, and comparison of both traditional and deep learning-based cyber-defense systems.

1. Testbed Architecture and Data Collection

Edge-IIoTset is derived from a physical seven-layer IIoT smart factory testbed with a stratified architecture emulating the OSI stack: Physical, Data Link, Network, Transport, Session, Presentation, and Application layers. Devices span programmable logic controllers (PLCs), industrial sensors, actuators, edge gateways, Supervisory Control and Data Acquisition (SCADA) servers, and Human–Machine Interfaces (HMIs). Protocol diversity includes both IT-centric (TCP/IP, UDP, ICMP, HTTP) and OT/IIoT-supported protocols (MQTT, Modbus/TCP). Network traffic is recorded by mirroring all packets at the edge gateway, subsequently aggregating packets into bidirectional flows for processing (Ishtiaq et al., 3 Oct 2025).

Benign traffic encompasses routine industrial operations—supervisory commands, sensory updates, actuator signals, and periodic bulk transfers (e.g., firmware updates). Attack traffic is generated by orchestrating a robust range of cyberattacks (such as DDoS, MITM, injection attacks, scanning, ransomware, and malware uploads) using publicly available and custom-built offensive scripts tailored to exploit protocol and device-specific vulnerabilities (Dobler et al., 8 May 2024).

2. Attack Taxonomy and Labeling Scheme

Edge-IIoTset captures extensive adversarial activity mapped to the full Cyber Kill Chain (CKC):

Reconnaissance: Port scanning (TCP/UDP), OS fingerprinting, protocol enumeration, vulnerability scanning.
Exploitation: DoS/DDoS (across multiple protocols), application-level injection (SQL, XSS), Modbus or PLC-specific exploits.
Installation: Backdoors, malware uploads, brute-force attacks, malicious firmware uploads.
Command & Control: Data exfiltration, session tampering.
Actions on Objectives: Ransomware encryption.

Labeling is performed at the flow or packet level with granularity varying by paper. Multiclass schemes distinguish between 14–15 individual attack types and benign traffic (Ishtiaq et al., 3 Oct 2025, Hasan et al., 25 Jan 2025), or broader classes such as DDoS, MITM, Information Gathering, Injection, Malware, and Normal traffic (Gueriani et al., 21 Jan 2025). Binary versions of the dataset collapse all attacks to a single “malicious” class, as employed in broad benchmarking (Dobler et al., 8 May 2024).

3. Dataset Composition, Feature Set, and Statistical Profiles

Reported dataset sizes range from 1.9 million to 3.44 million flow-level instances, depending on the release version and preprocessing steps applied (Hasan et al., 25 Jan 2025, Dobler et al., 8 May 2024). Individual studies specify instance counts and attack class breakdowns:

Version/Study	Instances	Benign %	# Attack Types	Label Schema	Reference
CST-AFNet	2,219,201	≈50–60	15 + benign	Multiclass	(Ishtiaq et al., 3 Oct 2025)
Autoencoder DT	1,927,304	71.65	14 + benign	Multiclass	(Hasan et al., 25 Jan 2025)
Dobler survey	≈3,440,000	78	20 (CKC-based)	Binary	(Dobler et al., 8 May 2024)

Feature vectors per flow originally contain 61–63 attributes:

Temporal: Flow start/end times, durations.
Packet-level: Counts of packets/bytes (bidirectional), protocol header lengths, fragmentation flags.
Transport-layer: Source/destination ports, TCP flags, window size, sequence/acknowledgment numbers.
Application-layer/protocol-specific: HTTP verbs/status codes, MQTT PUBLISH types, Modbus function codes.
Derived/statistical: Mean/min/max/standard deviation of packet sizes, inter-arrival times (IATs).
Device and sensor: Sensor reading types (e.g., temperature, humidity, soil moisture) in advanced versions (Gueriani et al., 21 Jan 2025).

Feature selection and engineering steps in some works reduce the feature set, e.g., dropping constant/correlated columns or retaining only detection-enhancing features (down to 24 for some autoencoder experiments) (Hasan et al., 25 Jan 2025).

4. Preprocessing and Class Imbalance Mitigation

Preprocessing protocols are tuned per paper:

Missing values in numerical fields are imputed with $\mathrm{median}(x^{(j)})$ ; categorical variables use mode imputation.
Irrelevant or redundant string/meta fields (e.g., “Attack_label”) are dropped.
Numerical features are standardized ( $\hat x_i^{(j)} = (x_i^{(j)} - \mu_j)/\sigma_j$ ) or min–max scaled ( $x' = (x - x_\mathrm{min})/(x_\mathrm{max}-x_\mathrm{min})$ ) as appropriate (Ishtiaq et al., 3 Oct 2025, Hasan et al., 25 Jan 2025).
Categorical fields are label-encoded to integer indices; scikit-learn’s LabelEncoder is commonly used (Gueriani et al., 21 Jan 2025).

Significant class imbalance is typical, manifested most acutely in minority attacks: e.g., 358 MITM samples (0.02 %) and 853 Fingerprinting samples (0.04 %) versus 1,380,858 benign (71.65 %) (Hasan et al., 25 Jan 2025). Approaches to mitigate this include:

Cost-sensitive learning with class-weighted loss functions in autoencoders.
Synthetic Minority Over-sampling Technique (SMOTE) to produce balanced class distributions for model training (Gueriani et al., 21 Jan 2025).
Class weights computed and supplied to the loss function during neural model training.

Dimensionality reduction is employed in some pipelines via deep autoencoders, reducing the 24 selected features to a latent bottleneck of six dimensions with weighted MSE loss (Hasan et al., 25 Jan 2025). No payload data is included, constraining models to flow-level analysis.

5. Evaluation Protocols and Baseline Performance

Canonical evaluation uses stratified 80 / 20 train-test splits; 20 % of the training set is reserved for validation. No studies report k-fold cross-validation. Minority oversampling is applied strictly to the training partition to prevent leakage.

Downstream evaluation metrics adhere to standard definitions:

Accuracy: $\frac{TP + TN}{TP + FP + TN + FN}$
Precision: $\frac{TP}{TP + FP}$
Recall: $\frac{TP}{TP + FN}$
F1: $2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$
FPR: $\frac{FP}{FP + TN}$

Reported baseline model performances on the dataset, reflecting both multi-class and binary protocols:

Model/Ref	Multiclass Acc.	F1 (macro)	Inference Time	Details
CST-AFNet (Ishtiaq et al., 3 Oct 2025)	99.97 %	>99.3 %	—	63 features, dual-attention CNN+BiGRU
LSTM-CNN-Attention (Gueriani et al., 21 Jan 2025)	99.04 %	—	—	Final model, SMOTE-balanced
Autoenc. + DT (Hasan et al., 25 Jan 2025)	99.94 %	99.94 %	0.185/0.187ms	24 features, Jetson Nano runtime

Minority-class F1 scores, particularly for XSS and Fingerprinting, remain lower ( $\sim$ 0.92–0.93), reflecting intractability in extreme imbalance regimes even after augmentation (Ishtiaq et al., 3 Oct 2025).

6. Dataset Complexity, Limitations, and Comparative Survey

Edge-IIoTset is assessed in systematic reviews as offering a moderate challenge (average complexity score $CS \approx 0.284$ ) and an imbalance ratio ( $IR \approx 3.44$ ) lower than many IIoT datasets (Dobler et al., 8 May 2024). In complexity taxonomy, it stands positioned for federated and centralized ML, feature-selection studies, and benchmarking both classical and DNN-based IDS models.

Documented limitations include:

Extreme class imbalance for the smallest minority attack types
Absence of payload data, precluding deep-packet-inspection features
Insufficient documentation of feature names, units, and provenance in some releases
Reliance on synthetic traffic generation for attacks, which may not fully capture adversarial behaviors observed in field deployments
Real-time applicability on low-power embedded hardware is impacted by high feature dimensionality and model complexity, though lightweight models (Decision Trees, autoencoders) have demonstrated Jetson Nano deployment (Hasan et al., 25 Jan 2025)

Recommended research uses include supervised learning benchmarking, federated learning, feature selection technique evaluation, and broad-spectrum anomaly detection under moderate complexity assumptions (Dobler et al., 8 May 2024).

7. Recommended Best Practices and Future Directions

For optimal exploitation of Edge-IIoTset:

Researchers are encouraged to verify and, if necessary, recompute summary statistics, prevalence, and feature distributions directly from the dataset (e.g., via the IEEE DataPort DOI:10.21227/MBC1-1H68), as published studies often omit such details
Effective handling of class imbalance—combining oversampling, cost-sensitive architectures, and robust validation—is essential for realistic multi-class and minority-attack detection
Publication of expanded metadata, including precise feature documentation and real-world attack traces, would further enrich the dataset’s value for transfer-learning and cross-domain generalization studies
Testing algorithms on original imbalanced as well as rebalanced versions is advised to quantify practical robustness in operational settings

Edge-IIoTset has become a reference benchmark in IIoT network security research, supporting both experimental reproducibility and comparative evaluation of attack detection methodologies (Ishtiaq et al., 3 Oct 2025, Gueriani et al., 21 Jan 2025, Hasan et al., 25 Jan 2025, Dobler et al., 8 May 2024).