- The paper introduces a hybrid approach using pre-trained TabPFNv2.5 for rapid triage, achieving near-ensemble classification accuracy (97%+) with 40x faster inference.
- The methodology integrates transformer-based tabular models with ensemble methods to balance speed and high-fidelity threat detection across diverse IoT devices.
- Empirical results highlight a trade-off between accuracy and latency, underscoring challenges in detecting low-prevalence threats and ensuring cross-device generalization.
Optimizing IoT Intrusion Detection with Tabular Foundation Models for Smart City Forensics
Introduction
This work presents a comprehensive empirical comparison between transformer-based tabular foundation models (TabPFNv2.5) and traditional ensemble methods (Random Forest and Gradient Boosting) for IoT intrusion detection in the context of smart city forensics. The motivation arises from the critical demands of security operations in urban environments, which necessitate the ability to rapidly process massive telemetry with accurate threat discrimination. The authors address previously unexamined tradeoffs: the high inference cost of established ensemble methods and the unexplored utility of foundation models, trained agnostically and usable without retraining, as rapid screening mechanisms for heterogeneous, large-scale IoT telemetry.
Methodology
The evaluation leverages TON_IoT, a large-scale telemetry dataset from seven disparate device categories (over 3.6M records). The authors delineate significant class imbalance, with backdoor and password attacks comprising the majority of attack-labeled records and rare, underrepresented threats (notably scanning) making detection difficult.
A hybrid pipeline is proposed in which TabPFNv2.5, executed with frozen pre-trained weights, conducts rapid (millisecond-level) attack triage, immediately routing benign records to audit logs while suspicious samples are subjected to fine-grained analysis via Random Forest and Gradient Boosting ensembles. This design aligns with SIEM operational workflows, supporting both the throughput requirements and attribution rigor demanded by modern forensic investigations.
Experimental configuration includes 5-fold cross-validation, computation of standard classification performance metrics (accuracy, precision, recall, F1-score), and explicit reporting of training and inference latency.
Empirical Results
Classification Accuracy
Across all seven device types, Random Forest achieves the highest binary accuracy (mean: 99.48%) and leading multi-class accuracy (mean: 99.44%). Gradient Boosting offers comparable, albeit slightly lower, performance (mean binary: 98.85%). TabPFNv2.5 delivers 97.2% binary accuracy and 96.8% multi-class accuracy using stratified subsampling for foundation model compatibility on large datasets. The empirical finding is that TabPFNv2.5 yields a negligible 2.3% deficit in binary accuracy compared to Random Forest.
Computational Efficiency
The principal advantage of TabPFNv2.5 is an inference latency reduction by a factor of 40 relative to Random Forest (0.02s vs. 0.82s per 10,000 samples), without any task-specific training cost. This positions the model as suitable for real-time triage in SIEM-driven, latency-constrained environments.
Failure Analysis
Attack-wise analysis identifies scanning as the most challenging to detect (Random Forest F1: 69.8%), consistent with its behavioral overlap with benign reconnaissance. Typical attack types (DDoS, ransomware, injection) yield F1-scores >95%. These findings underscore the necessity for context-aware post-processing and the persistent challenge of minority-class, low-prevalence threat detection.
Generalization Assessment
Cross-device generalization experiments reveal that transferability is contingent on feature space similarity. Models trained on feature-similar device pairs (e.g., Fridge → Thermostat) show robust generalization (<2% accuracy loss), while disparate pairs (Garage Door → Fridge) suffer from pronounced recall collapse, necessitating device-aware modeling or advanced feature normalization in production deployments.
Discussion
Implications for Smart City Security Operations
The results validate TabPFNv2.5 as a viable candidate for the rapid triage phase of forensic workflows. The hybrid approach ensures high-throughput, low-latency prioritization for SOCs while retaining the high-confidence attribution of ensemble models for actionable alerts and reporting. This paradigm enables significant operational efficiency, reducing event assessment time from seconds to milliseconds, which is essential for incident response in dense urban infrastructures.
Theoretical Impact and AI Developments
Empirically, this study substantiates the use-case for tabular foundation models in cybersecurity—traditionally dominated by tree ensembles—and demonstrates their practical speed-accuracy tradeoff on realistic telemetry distributions. The hybrid architecture paves a direction for operationalizing foundation models as pre-screening modules in event-driven systems. Future research should focus on pre-training tabular transformers on cybersecurity-native telemetry distributions, exploring zero-day/generalization behavior, and integrating contextual and temporal signals to address the persistent limitations in low-prevalence threat detection.
Limitations
The generalizability is constrained by the representativeness of TON_IoT and the requirement for subsampling larger datasets to accommodate TabPFNv2.5. True zero-day and advanced persistent threats are not evaluated. Moreover, real-world deployment conditions may induce different compute and latency characteristics than the cloud-based testbed used.
Recommendations for Deployment
For SOCs and practitioners, a hybrid workflow is warranted: utilize TabPFNv2.5 for initial, real-time screening; escalate flagged records to ensemble-based classifiers (with analyst override for conflicting verdicts); employ device- or context-sensitive preprocessing for environments with highly heterogeneous telemetry; and supplement scanning detection with auxiliary behavioral or temporal indicators.
Conclusion
The paper establishes that tabular foundation models like TabPFNv2.5 offer a compelling choice for rapid front-end triage in IoT forensic workflows, achieving high classification accuracy (97%+) with orders of magnitude faster inference than standard ensemble methods. Integrating foundation and ensemble models in a structured pipeline enables high-throughput and high-fidelity attack detection suitable for the operational realities of smart cities. The major open issues remain robust detection of low-prevalence attacks and reliable cross-device generalization, motivating further research into cyber-native foundation model development and adaptive, context-aware pipelines.