BRIDGE and TCH-Net: Heterogeneous Benchmark and Multi-Branch Baseline for Cross-Domain IoT Botnet Detection

Published 13 Apr 2026 in cs.CR, cs.LG, and cs.NI | (2604.11324v1)

Abstract: IoT botnet detection has advanced, yet most published systems are validated on a single dataset and rarely generalise across environments. Heterogeneous feature spaces make multi-dataset training practically impossible without discarding semantic interpretability or introducing data integrity violations. No prior work has addressed both problems with a formally specified, reproducible methodology. This paper does. We introduce BRIDGE (Benchmark Reference for IoT Domain Generalisation Evaluation), the first formally specified heterogeneous multi-dataset benchmark for IoT intrusion detection, unifying CICIDS-2017, CIC-IoT-2023, Bot-IoT, Edge-IIoTset, and N-BaIoT through a 46-feature semantic canonical vocabulary grounded in CICFlowMeter nomenclature, with genuine-equivalence-only feature mapping, explicit zero-filling, and per-dataset coverage from 15% to 93%. A leave-one-dataset-out (LODO) protocol makes the generalisation gap precisely measurable: all five evaluated architectures achieve mean LODO F1 between 0.39 and 0.47, and we establish the first community generalisation baseline at mean LODO F1 = 0.5577, a result that shifts the agenda from single-benchmark optimisation toward cross-environment generalisation. We propose TCH-Net, a multi-branch network fusing a three-path Temporal branch (residual convolutional-BiGRU, stride-downsampled BiGRU, pre-LayerNorm Transformer), a provenance-conditioned Contextual branch, and a Statistical branch via Cross-Branch Gated Attention Fusion (CB-GAF) with learnable sigmoid gates for dynamic feature-wise mixing. Across five random seeds, TCH-Net achieves F1 = 0.8296 +/- 0.0028, AUC = 0.9380 +/- 0.0025, and MCC = 0.6972 +/- 0.0056, outperforming all twelve baselines (p < 0.05, Wilcoxon) and recording the highest LODO F1 overall. BRIDGE and the full pipeline are at https://github.com/Ammar-ss/TCH-Net.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper introduces BRIDGE, a unified multi-dataset benchmark for IoT intrusion detection under domain shift, revealing generalisation issues in prior methods.
It presents TCH-Net, a multi-branch architecture integrating temporal, statistical, and contextual modalities via a novel CB-GAF fusion for superior detection performance.
Performance evaluations show significant F1 and ROC-AUC gains over baselines while exposing persistent cross-domain generalisation gaps to drive future domain-adaptive innovations.

BRIDGE and TCH-Net: Heterogeneous Benchmark and Multi-Branch Baseline for Cross-Domain IoT Botnet Detection

Motivation and Background

A critical and long-standing flaw in the IoT botnet detection literature lies in its exclusive reliance on single-dataset evaluations. This practice systematically overestimates model generalisation to realistic, heterogeneous deployments. Prior approaches are confounded by incompatible feature spaces, capture-tool idiosyncrasies, and a lack of standardized, auditable preprocessing. The absence of a reproducible, principled cross-dataset evaluation regime precludes meaningful advances in domain generalisation and adaptation. The present work addresses both the evaluation protocol gap and the feature heterogeneity problem via two principal contributions: the BRIDGE benchmark and the TCH-Net architecture.

The BRIDGE Benchmark: Canonical Feature Alignment for Cross-Domain Comparison

BRIDGE (Benchmark Reference for IoT Domain Generalisation Evaluation) institutes the first rigorously-specified, multi-dataset evaluation benchmark for IoT intrusion detection under domain shift. Critically, five structurally distinct publicly available datasets (CICIDS-2017, CIC-IoT-2023, Bot-IoT, Edge-IIoTset, and N-BaIoT) are unified through a 46-dimensional canonical feature vocabulary built upon CICFlowMeter nomenclature with strict semantic equivalence constraints.

The mapping employs explicit zero-filling for missing features, ensuring interpretability and integrity auditing; per-dataset coverage varies sharply ( $15\% - 93\%$ ), reflecting the divergent data-generating regimes and stress-testing model robustness to heterogeneous input representation.

Figure 1: Feature coverage of the 46-feature canonical vocabulary across five BRIDGE datasets. Blue indicates a mapped feature, grey denotes explicit zero-fill.

A reproducible preprocessing pipeline (class balancing, robust scaling, windowed sequence construction, and leakage-free split) underpins BRIDGE, ensuring transparency and consistency. Critically, BRIDGE operationalizes a leave-one-dataset-out (LODO) evaluation protocol, providing, for the first time, formally measured generalisation gaps that expose the limitations of in-distribution overfitting.

TCH-Net: Multi-Branch Neural Baseline for Heterogeneous IoT IDS

TCH-Net is introduced as a strong, reproducible baseline tailored to BRIDGE’s structural heterogeneity. TCH-Net’s architecture is decisively multi-branch, explicitly fusing three distinct modalities:

Temporal branch (T): Three parallel paths—(i) residual depthwise-separable conv-SE-BiGRU stack for local and medium-range motifs, (ii) stride-downsampled conv-BiGRU for coarse-scale discrimination, (iii) full-resolution, pre-LayerNorm Transformer encoder for global contextualization—with a shared 8-step temporal grid and final multi-head self-attention.
Figure 3: T-branch three-path architecture detail capturing multi-scale temporal patterns via parallel convolutional, recurrent, and transformer-based encoders.
Statistical branch (H): A mean-pooled MLP capturing aggregate distributional statistics, robust to temporal ordering and capable of representing global shifts.
Contextual branch (C): Embeds dataset and device-class provenance, conditioning downstream fusion on input origin and canonical feature coverage.

Branch representations, after initial shared nonlinear feature projection, are integrated with a novel Cross-Branch Gated Attention Fusion (CB-GAF) mechanism. Each branch independently queries the other two via cross-attention; a learnable per-branch sigmoid vector gate enables feature-wise, asymmetric fusion calibrated by input context.

Figure 2: CB-GAF mechanism enables dynamic cross-branch information flow, modulated by per-branch, feature-wise gates.

The final output layer employs a residual skip pathway for improved gradient dynamics and auxiliary feature reconstruction regularisation to stabilise training in the presence of missing data.

Figure 4: Classification head with a residual connection ensuring robust gradient propagation.

Experimental Evaluation

Benchmarking Against Prior Art

TCH-Net is comprehensively evaluated against twelve established baselines: recurrent models (BiLSTM, BiGRU), convolutional (1D-CNN, CNN-LSTM), transformer IDS, classical (RF, XGBoost), autoencoder-based, DNN, and GNN architectures. All models are assessed under identical data processing, balancing, and evaluation regimes. TCH-Net achieves the highest score on all key metrics (mean F1 $= 0.8296$ , ROC-AUC $= 0.9380$ , MCC $= 0.6972$ ), with statistically significant performance improvements across the board.

Figure 5: Radar plot showing TCH-Net’s metric superiority over top baseline models.

Ablation and Component Analysis

Ablation experiments unequivocally demonstrate that all three branch types contribute orthogonally—removal of any single branch or CB-GAF results in consistent degradation ( $>\! 0.054$ F1 loss). The contextual branch, in particular, is indispensable for calibrating information flow based on dataset provenance and coverage.

Figure 6: Branch ablation study confirms the necessity of all three branches and CB-GAF for optimal performance.

Cross-Dataset Generalisation and Domain Shift Measurement

The principal value of BRIDGE lies in its reproducible quantification of the generalisation gap. LODO results demonstrate that even state-of-the-art architectures (including TCH-Net and all established deep baselines) suffer a mean F1 collapse from 0.83 (random split) to 0.56 (LODO), exposing the domain shift as a structural barrier that cannot be elided via feature alignment alone.

Figure 7: Leave-One-Dataset-Out (LODO) F1 scores reveal the severity of the generalisation gap; in-distribution F1 is not predictive of cross-domain performance.

A generalisation shortfall as high as $-0.638$ for single-dataset-dominant settings (CICIDS-2017) underscores the necessity of methods that go beyond naive alignment, motivating future work on domain-adaptive architectures.

Practical and Theoretical Implications

BRIDGE provides a community benchmark for cross-domain IoT IDS generalisation. By establishing a rigorous, reproducible protocol and canonical feature vocabulary, it makes systematic progress in domain adaptation actually measurable—a critical advance for the field. The findings invalidate prior optimism rooted in single-dataset results and recenter the research agenda on approaches robust to feature-set heterogeneity and distributional shift.

TCH-Net, via multi-branch fusion and CB-GAF, demonstrates that architectural heterogeneity and provenance conditioning are effective but not sufficient: an F1 gap of $\sim 0.27$ in LODO remains an open problem. Future work should focus on domain adversarial training regimes, dataset-conditional normalisation, and extension to diverse packet-level input representations. Additionally, the generalisation-critical setting of multi-class attack classification (beyond binary detection) is theoretically and operationally salient.

On the practical front, TCH-Net’s computational efficiency ( $\sim$ 2.7M parameters, 6.4ms inference latency, $<$ 11MB RAM) is compatible with current-generation edge inference platforms, making real-world deployment in IoT gateways feasible. However, deployment in ultra-constrained microcontroller settings will necessitate further advances in model compression via knowledge distillation and quantisation.

Conclusion

This work exposes and quantifies the generalisation problem in IoT botnet detection via the BRIDGE benchmark and establishes TCH-Net as a new strong multi-modal baseline. The persistent domain shift evidenced by the LODO F1 gap presents a compelling direction for future algorithmic innovation targeting robust adaptation under severe feature and distribution mismatch. The canonical vocabulary, codebase, and evaluation protocol provided by BRIDGE now enable the research community to measure progress on foundational generalisation challenges, moving the field beyond overfitting to specific, isolated datasets.

Markdown Report Issue