- The paper introduces a dual-agent, RAG-based system for IoT intrusion detection, achieving macro F1-scores of up to 89.75% on benchmark datasets.
- It employs a persistent Experience Library to convert error analyses into interpretable rules, addressing catastrophic forgetting and enabling continual learning.
- The approach outperforms zero-shot LLM baselines, providing real-time adaptation and explainability crucial for resource-constrained IoT environments.
A Multi-Agent RAG System for IoT Intrusion Detection with Continual, Interpretable Learning
Introduction
The proliferation of IoT devices has fundamentally altered the attack landscape in networked systems, increasing both the diversity of device profiles and the complexity of network traffic. Traditional signature-based and anomaly-based NIDS frameworks remain limited in their capacity to detect zero-day threats and interpretably justify classifications, especially within resource-constrained and protocol-diverse IoT environments. Attempts to address these issues using classical ML/DL methods yield high predictive efficacy but perpetuate the limitation of interpretability, with additional retraining required when novel attack vectors arise. Direct deployment of pre-trained LLMs only exacerbates the domain gap, with zero-shot LLMs (e.g., GPT-4o) delivering poor macro F1-scores (17–4.96%) on pivotal intrusion detection benchmarks, underscoring their instability and unsuitability without domain-grounded augmentation.
MA-IDS Architecture and Core Innovations
MA-IDS proposes a multi-agent, reasoning-augmented pipeline that circumvents the “black box” limitations and retraining inefficiencies of classical approaches. The system is composed of two agentic LLM components, orchestrated over a persistent FAISS-backed Experience Library and employing Retrieval-Augmented Generation (RAG) as its central inductive bias. The architecture separates online inference from error analysis, yielding robust continual, explainable learning without LLM parameter updates.
The workflow bifurcates into two distinct operational phases. During Phase 1 (offline), an Error Analysis Agent converts classification errors into structured, human-interpretable rules, which are committed to the Experience Library. Phase 2 (online) sees the Traffic Classification Agent leveraging RAG to augment LLM input with stored experience, enabling context-grounded, adaptive intrusion classification.
Figure 1: The dual-phase MA-IDS workflow: error-driven rule induction and retrieval-augmented real-time inference.
Fundamentally, this approach externalizes knowledge learning and correction into a scalable, queryable memory, directly addressing catastrophic forgetting and enabling non-destructive model evolution. All reasoning is carried out on semantically enriched NetFlow data, processed into robust embeddings, with each flow–rule pair serving as a high-precision anchor for future similar events.
Experimental Validation and Results
The efficacy of MA-IDS is demonstrated via rigorous evaluation on two canonical NetFlow-based IoT datasets: NF-BoT-IoT and NF-ToN-IoT. MA-IDS is assessed against naive zero-shot LLMs and legacy ML baselines (SVM, AdaBoost, Naïve Bayes), employing macro-averaged Precision, Recall, and F1-Score for class-agnostic robustness.
MA-IDS exhibits a macro F1-Score of 89.75% on NF-BoT-IoT and 85.22% on NF-ToN-IoT, gaining +72 and +80 percentage points over the zero-shot GPT-4o baseline, respectively. These results not only surpass classical ML alternatives (except SVM on NF-ToN-IoT) but crucially provide semantic explainability per decision—an advantage absent in SVM and DL systems.

Figure 2: Macro-averaged performance metrics, revealing the dominance of MA-IDS over the zero-shot baseline due to Experience Library retrieval.
The impact of the Experience Library is further supported by ablation, where disabling rule retrieval causes a sharp performance collapse, matching the zero-shot LLM configuration. Conversely, enabling experience retrieval with a fixed rule base yields near-peak performance, demonstrating the sufficiency and necessity of this design.
Performance profiling as the Experience Library grows supports the continual learning hypothesis, with error rates decreasing as rules accumulate and classification boundaries refine. On high-class-overlap classes (e.g., DDoS/DoS in NF-BoT-IoT, Injection/XSS in NF-ToN-IoT), the rule induction agent dynamically adapts, shown by higher per-class rule counts and improved per-class F1.

Figure 3: Benchmark dataset (NF-BoT-IoT) utilized for systematic evaluation.
Practical and Theoretical Implications
MA-IDS substantiates that external, rule-driven memory architectures combined with RAG-augmented LLMs yield a multi-objective performance balance, blending high detection efficacy and forensic traceability. This closed-loop agentic method marks a theoretical transition from static, weight-encoded models to lifelong, data–reasoning–retrieval systems that are robust to concept drift, easily extensible, and naturally support post-hoc rationalization and root-cause analysis.
From a practical systems standpoint, MA-IDS is conducive to low-overhead deployment, requiring minimal retraining interventions, and supports privacy/encryption compatibility by eschewing payload-based rules. These properties are especially pertinent to heterogenous, resource-constrained IoT deployments where update cycles and real-time constraints are significant.
Speculation on Future Directions
The paradigm shift instantiated by MA-IDS suggests several immediate extensions in research and deployment. First, the architecture is compatible with edge-deployable LLM variants and can benefit from further embedding model optimization. Second, augmenting the Experience Library with richer multi-modal telemetry (e.g., encrypted traffic side-channels, topological context) could generalize its capacity for zero-day threat adaptation. Third, adopting open-set detection objectives and unsupervised rule clustering can enhance resilience to adversarial concept drift. Finally, integrating model checking and continuous verification may ensure explainability requirements for increasingly regulated operational environments.
Conclusion
MA-IDS demonstrates that a hybrid agentic approach leveraging RAG with an externalized, continually expanding Experience Library offers not only competitive intrusion detection performance but also algorithmic transparency and continual adaptation. These findings advocate for an agent-memory paradigm as a new foundation for interpretable, explainable, and self-improving NIDS, with direct applicability to the exponential growth of IoT and the corresponding escalation in network security demands.