Edge-IIoTset Dataset Overview
- Edge-IIoTset is a comprehensive labeled network security dataset that benchmarks intrusion detection in industrial IoT and edge computing environments.
- It is collected from a realistic seven-layer IIoT smart factory testbed, capturing diverse protocols, benign traffic, and a wide range of cyberattacks.
- The dataset supports evaluation of both traditional and deep learning methods, addressing challenges like class imbalance and detailed feature preprocessing.
Edge-IIoTset is a comprehensive labeled network security dataset designed for benchmarking intrusion detection methods in Industrial Internet of Things (IIoT) and edge-computing environments. Generated from a realistic, multi-layer industrial testbed, it is constructed to reflect the diversity, heterogeneity, and operational complexity of contemporary IIoT deployments. Edge-IIoTset captures not only benign supervisory and telemetry traffic but also a wide spectrum of network attacks, enabling the development, validation, and comparison of both traditional and deep learning-based cyber-defense systems.
1. Testbed Architecture and Data Collection
Edge-IIoTset is derived from a physical seven-layer IIoT smart factory testbed with a stratified architecture emulating the OSI stack: Physical, Data Link, Network, Transport, Session, Presentation, and Application layers. Devices span programmable logic controllers (PLCs), industrial sensors, actuators, edge gateways, Supervisory Control and Data Acquisition (SCADA) servers, and Human–Machine Interfaces (HMIs). Protocol diversity includes both IT-centric (TCP/IP, UDP, ICMP, HTTP) and OT/IIoT-supported protocols (MQTT, Modbus/TCP). Network traffic is recorded by mirroring all packets at the edge gateway, subsequently aggregating packets into bidirectional flows for processing (Ishtiaq et al., 3 Oct 2025).
Benign traffic encompasses routine industrial operations—supervisory commands, sensory updates, actuator signals, and periodic bulk transfers (e.g., firmware updates). Attack traffic is generated by orchestrating a robust range of cyberattacks (such as DDoS, MITM, injection attacks, scanning, ransomware, and malware uploads) using publicly available and custom-built offensive scripts tailored to exploit protocol and device-specific vulnerabilities (Dobler et al., 8 May 2024).
2. Attack Taxonomy and Labeling Scheme
Edge-IIoTset captures extensive adversarial activity mapped to the full Cyber Kill Chain (CKC):
- Reconnaissance: Port scanning (TCP/UDP), OS fingerprinting, protocol enumeration, vulnerability scanning.
- Exploitation: DoS/DDoS (across multiple protocols), application-level injection (SQL, XSS), Modbus or PLC-specific exploits.
- Installation: Backdoors, malware uploads, brute-force attacks, malicious firmware uploads.
- Command & Control: Data exfiltration, session tampering.
- Actions on Objectives: Ransomware encryption.
Labeling is performed at the flow or packet level with granularity varying by paper. Multiclass schemes distinguish between 14–15 individual attack types and benign traffic (Ishtiaq et al., 3 Oct 2025, Hasan et al., 25 Jan 2025), or broader classes such as DDoS, MITM, Information Gathering, Injection, Malware, and Normal traffic (Gueriani et al., 21 Jan 2025). Binary versions of the dataset collapse all attacks to a single “malicious” class, as employed in broad benchmarking (Dobler et al., 8 May 2024).
3. Dataset Composition, Feature Set, and Statistical Profiles
Reported dataset sizes range from 1.9 million to 3.44 million flow-level instances, depending on the release version and preprocessing steps applied (Hasan et al., 25 Jan 2025, Dobler et al., 8 May 2024). Individual studies specify instance counts and attack class breakdowns:
| Version/Study | Instances | Benign % | # Attack Types | Label Schema | Reference |
|---|---|---|---|---|---|
| CST-AFNet | 2,219,201 | ≈50–60 | 15 + benign | Multiclass | (Ishtiaq et al., 3 Oct 2025) |
| Autoencoder DT | 1,927,304 | 71.65 | 14 + benign | Multiclass | (Hasan et al., 25 Jan 2025) |
| Dobler survey | ≈3,440,000 | 78 | 20 (CKC-based) | Binary | (Dobler et al., 8 May 2024) |
Feature vectors per flow originally contain 61–63 attributes:
- Temporal: Flow start/end times, durations.
- Packet-level: Counts of packets/bytes (bidirectional), protocol header lengths, fragmentation flags.
- Transport-layer: Source/destination ports, TCP flags, window size, sequence/acknowledgment numbers.
- Application-layer/protocol-specific: HTTP verbs/status codes, MQTT PUBLISH types, Modbus function codes.
- Derived/statistical: Mean/min/max/standard deviation of packet sizes, inter-arrival times (IATs).
- Device and sensor: Sensor reading types (e.g., temperature, humidity, soil moisture) in advanced versions (Gueriani et al., 21 Jan 2025).
Feature selection and engineering steps in some works reduce the feature set, e.g., dropping constant/correlated columns or retaining only detection-enhancing features (down to 24 for some autoencoder experiments) (Hasan et al., 25 Jan 2025).
4. Preprocessing and Class Imbalance Mitigation
Preprocessing protocols are tuned per paper:
- Missing values in numerical fields are imputed with ; categorical variables use mode imputation.
- Irrelevant or redundant string/meta fields (e.g., “Attack_label”) are dropped.
- Numerical features are standardized () or min–max scaled () as appropriate (Ishtiaq et al., 3 Oct 2025, Hasan et al., 25 Jan 2025).
- Categorical fields are label-encoded to integer indices; scikit-learn’s LabelEncoder is commonly used (Gueriani et al., 21 Jan 2025).
Significant class imbalance is typical, manifested most acutely in minority attacks: e.g., 358 MITM samples (0.02 %) and 853 Fingerprinting samples (0.04 %) versus 1,380,858 benign (71.65 %) (Hasan et al., 25 Jan 2025). Approaches to mitigate this include:
- Cost-sensitive learning with class-weighted loss functions in autoencoders.
- Synthetic Minority Over-sampling Technique (SMOTE) to produce balanced class distributions for model training (Gueriani et al., 21 Jan 2025).
- Class weights computed and supplied to the loss function during neural model training.
Dimensionality reduction is employed in some pipelines via deep autoencoders, reducing the 24 selected features to a latent bottleneck of six dimensions with weighted MSE loss (Hasan et al., 25 Jan 2025). No payload data is included, constraining models to flow-level analysis.
5. Evaluation Protocols and Baseline Performance
Canonical evaluation uses stratified 80 / 20 train-test splits; 20 % of the training set is reserved for validation. No studies report k-fold cross-validation. Minority oversampling is applied strictly to the training partition to prevent leakage.
Downstream evaluation metrics adhere to standard definitions:
- Accuracy:
- Precision:
- Recall:
- F1:
- FPR:
Reported baseline model performances on the dataset, reflecting both multi-class and binary protocols:
| Model/Ref | Multiclass Acc. | F1 (macro) | Inference Time | Details |
|---|---|---|---|---|
| CST-AFNet (Ishtiaq et al., 3 Oct 2025) | 99.97 % | >99.3 % | — | 63 features, dual-attention CNN+BiGRU |
| LSTM-CNN-Attention (Gueriani et al., 21 Jan 2025) | 99.04 % | — | — | Final model, SMOTE-balanced |
| Autoenc. + DT (Hasan et al., 25 Jan 2025) | 99.94 % | 99.94 % | 0.185/0.187ms | 24 features, Jetson Nano runtime |
Minority-class F1 scores, particularly for XSS and Fingerprinting, remain lower (0.92–0.93), reflecting intractability in extreme imbalance regimes even after augmentation (Ishtiaq et al., 3 Oct 2025).
6. Dataset Complexity, Limitations, and Comparative Survey
Edge-IIoTset is assessed in systematic reviews as offering a moderate challenge (average complexity score ) and an imbalance ratio () lower than many IIoT datasets (Dobler et al., 8 May 2024). In complexity taxonomy, it stands positioned for federated and centralized ML, feature-selection studies, and benchmarking both classical and DNN-based IDS models.
Documented limitations include:
- Extreme class imbalance for the smallest minority attack types
- Absence of payload data, precluding deep-packet-inspection features
- Insufficient documentation of feature names, units, and provenance in some releases
- Reliance on synthetic traffic generation for attacks, which may not fully capture adversarial behaviors observed in field deployments
- Real-time applicability on low-power embedded hardware is impacted by high feature dimensionality and model complexity, though lightweight models (Decision Trees, autoencoders) have demonstrated Jetson Nano deployment (Hasan et al., 25 Jan 2025)
Recommended research uses include supervised learning benchmarking, federated learning, feature selection technique evaluation, and broad-spectrum anomaly detection under moderate complexity assumptions (Dobler et al., 8 May 2024).
7. Recommended Best Practices and Future Directions
For optimal exploitation of Edge-IIoTset:
- Researchers are encouraged to verify and, if necessary, recompute summary statistics, prevalence, and feature distributions directly from the dataset (e.g., via the IEEE DataPort DOI:10.21227/MBC1-1H68), as published studies often omit such details
- Effective handling of class imbalance—combining oversampling, cost-sensitive architectures, and robust validation—is essential for realistic multi-class and minority-attack detection
- Publication of expanded metadata, including precise feature documentation and real-world attack traces, would further enrich the dataset’s value for transfer-learning and cross-domain generalization studies
- Testing algorithms on original imbalanced as well as rebalanced versions is advised to quantify practical robustness in operational settings
Edge-IIoTset has become a reference benchmark in IIoT network security research, supporting both experimental reproducibility and comparative evaluation of attack detection methodologies (Ishtiaq et al., 3 Oct 2025, Gueriani et al., 21 Jan 2025, Hasan et al., 25 Jan 2025, Dobler et al., 8 May 2024).