MA-IDS: Multi-Agent RAG Framework for IoT Network Intrusion Detection with an Experience Library

Published 7 Apr 2026 in cs.CR and cs.AI | (2604.05458v1)

Abstract: Network Intrusion Detection Systems (NIDS) face important limitations. Signature-based methods are effective for known attack patterns, but they struggle to detect zero-day attacks and often miss modified variants of previously known attacks, while many machine learning approaches offer limited interpretability. These challenges become even more severe in IoT environments because of resource constraints and heterogeneous protocols. To address these issues, we propose MA-IDS, a Multi-Agent Intrusion Detection System that combines LLMs with Retrieval Augmented Generation (RAG) for reasoning-driven intrusion detection. The proposed framework grounds LLM reasoning through a persistent, self-building Experience Library. Two specialized agents collaborate through a FAISS-based vector database: a Traffic Classification Agent that retrieves past error rules before each inference, and an Error Analysis Agent that converts misclassifications into human-readable detection rules stored for future retrieval, enabling continual learning through external knowledge accumulation, without modifying the underlying LLM. Evaluated on NF-BoT-IoT and NF-ToN-IoT benchmark datasets, MA-IDS achieves Macro F1-Scores of 89.75% and 85.22%, improving over zero-shot baselines of 17% and 4.96% by more than 72 and 80 percentage points. These results are competitive with SVM while providing rule-level explanations for every classification decision, demonstrating that retrieval-augmented reasoning offers a principled path toward explainable, self-improving intrusion detection for IoT networks.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a dual-agent, RAG-based system for IoT intrusion detection, achieving macro F1-scores of up to 89.75% on benchmark datasets.
It employs a persistent Experience Library to convert error analyses into interpretable rules, addressing catastrophic forgetting and enabling continual learning.
The approach outperforms zero-shot LLM baselines, providing real-time adaptation and explainability crucial for resource-constrained IoT environments.

A Multi-Agent RAG System for IoT Intrusion Detection with Continual, Interpretable Learning

Introduction

The proliferation of IoT devices has fundamentally altered the attack landscape in networked systems, increasing both the diversity of device profiles and the complexity of network traffic. Traditional signature-based and anomaly-based NIDS frameworks remain limited in their capacity to detect zero-day threats and interpretably justify classifications, especially within resource-constrained and protocol-diverse IoT environments. Attempts to address these issues using classical ML/DL methods yield high predictive efficacy but perpetuate the limitation of interpretability, with additional retraining required when novel attack vectors arise. Direct deployment of pre-trained LLMs only exacerbates the domain gap, with zero-shot LLMs (e.g., GPT-4o) delivering poor macro F1-scores (17–4.96%) on pivotal intrusion detection benchmarks, underscoring their instability and unsuitability without domain-grounded augmentation.

MA-IDS Architecture and Core Innovations

MA-IDS proposes a multi-agent, reasoning-augmented pipeline that circumvents the “black box” limitations and retraining inefficiencies of classical approaches. The system is composed of two agentic LLM components, orchestrated over a persistent FAISS-backed Experience Library and employing Retrieval-Augmented Generation (RAG) as its central inductive bias. The architecture separates online inference from error analysis, yielding robust continual, explainable learning without LLM parameter updates.

The workflow bifurcates into two distinct operational phases. During Phase 1 (offline), an Error Analysis Agent converts classification errors into structured, human-interpretable rules, which are committed to the Experience Library. Phase 2 (online) sees the Traffic Classification Agent leveraging RAG to augment LLM input with stored experience, enabling context-grounded, adaptive intrusion classification.

Figure 1: The dual-phase MA-IDS workflow: error-driven rule induction and retrieval-augmented real-time inference.

Fundamentally, this approach externalizes knowledge learning and correction into a scalable, queryable memory, directly addressing catastrophic forgetting and enabling non-destructive model evolution. All reasoning is carried out on semantically enriched NetFlow data, processed into robust embeddings, with each flow–rule pair serving as a high-precision anchor for future similar events.

Experimental Validation and Results

The efficacy of MA-IDS is demonstrated via rigorous evaluation on two canonical NetFlow-based IoT datasets: NF-BoT-IoT and NF-ToN-IoT. MA-IDS is assessed against naive zero-shot LLMs and legacy ML baselines (SVM, AdaBoost, Naïve Bayes), employing macro-averaged Precision, Recall, and F1-Score for class-agnostic robustness.

MA-IDS exhibits a macro F1-Score of 89.75% on NF-BoT-IoT and 85.22% on NF-ToN-IoT, gaining +72 and +80 percentage points over the zero-shot GPT-4o baseline, respectively. These results not only surpass classical ML alternatives (except SVM on NF-ToN-IoT) but crucially provide semantic explainability per decision—an advantage absent in SVM and DL systems.

Figure 2: Macro-averaged performance metrics, revealing the dominance of MA-IDS over the zero-shot baseline due to Experience Library retrieval.

The impact of the Experience Library is further supported by ablation, where disabling rule retrieval causes a sharp performance collapse, matching the zero-shot LLM configuration. Conversely, enabling experience retrieval with a fixed rule base yields near-peak performance, demonstrating the sufficiency and necessity of this design.

Performance profiling as the Experience Library grows supports the continual learning hypothesis, with error rates decreasing as rules accumulate and classification boundaries refine. On high-class-overlap classes (e.g., DDoS/DoS in NF-BoT-IoT, Injection/XSS in NF-ToN-IoT), the rule induction agent dynamically adapts, shown by higher per-class rule counts and improved per-class F1.

Figure 3: Benchmark dataset (NF-BoT-IoT) utilized for systematic evaluation.

Practical and Theoretical Implications

MA-IDS substantiates that external, rule-driven memory architectures combined with RAG-augmented LLMs yield a multi-objective performance balance, blending high detection efficacy and forensic traceability. This closed-loop agentic method marks a theoretical transition from static, weight-encoded models to lifelong, data–reasoning–retrieval systems that are robust to concept drift, easily extensible, and naturally support post-hoc rationalization and root-cause analysis.

From a practical systems standpoint, MA-IDS is conducive to low-overhead deployment, requiring minimal retraining interventions, and supports privacy/encryption compatibility by eschewing payload-based rules. These properties are especially pertinent to heterogenous, resource-constrained IoT deployments where update cycles and real-time constraints are significant.

Speculation on Future Directions

The paradigm shift instantiated by MA-IDS suggests several immediate extensions in research and deployment. First, the architecture is compatible with edge-deployable LLM variants and can benefit from further embedding model optimization. Second, augmenting the Experience Library with richer multi-modal telemetry (e.g., encrypted traffic side-channels, topological context) could generalize its capacity for zero-day threat adaptation. Third, adopting open-set detection objectives and unsupervised rule clustering can enhance resilience to adversarial concept drift. Finally, integrating model checking and continuous verification may ensure explainability requirements for increasingly regulated operational environments.

Conclusion

MA-IDS demonstrates that a hybrid agentic approach leveraging RAG with an externalized, continually expanding Experience Library offers not only competitive intrusion detection performance but also algorithmic transparency and continual adaptation. These findings advocate for an agent-memory paradigm as a new foundation for interpretable, explainable, and self-improving NIDS, with direct applicability to the exponential growth of IoT and the corresponding escalation in network security demands.

Markdown Report Issue