Cy-Agent: Modular Multi-Agent Cyber Defense
- Cy-Agent is a multi-agent system that uses generative AI to fuse logs, video, and audio data, ensuring unified orchestration and real-time threat detection.
- It employs a four-layer architecture—perception, analysis, orchestration, and response—with attention-based fusion and reinforcement learning for adaptive remediation.
- Experimental results show a 96.2% F1-score, 420ms latency, and 65% MTTR reduction, demonstrating robust performance in enterprise and IoT cybersecurity.
Cy-Agent refers to the core architectural and algorithmic components underpinning AgenticCyber, a generative AI-powered multi-agent system for multimodal threat detection and adaptive response in cybersecurity. The Cy-Agent concept embodies a scalable, modular framework designed to overcome the limitations of siloed security technologies by providing unified orchestration, reasoning, and remediation capabilities across disparate telemetry sources—logs, video, and audio—enabling real-time situational awareness in enterprise and IoT environments (Roy, 6 Dec 2025).
1. Multi-Agent System Architecture
The Cy-Agent architecture is implemented in four conceptual layers, each supporting specialized agents that handle distinct cybersecurity functions:
- Perception Layer: Comprises Log, Vision, and Audio Agents, each ingesting and processing modality-specific data streams. The Log Agent parses AWS CloudTrail–style JSON logs, extracting tuples of ⟨eventTime, eventName, sourceIP⟩ and invokes Google's Gemini LLM for security risk assessment. The Vision Agent samples every 10th frame in UCF-Crime surveillance videos, performing sharpness filtering and few-shot vision prompting via Gemini's vision API. The Audio Agent utilizes UrbanSound8K clips and YAMNet for classification of key audible threats, followed by risk assessment queries.
- Analysis Layer: Each agent emits a threat score and an accompanying natural language explanation . Scoring mechanisms include Isolation Forests (logs), autoencoder reconstruction error (vision), and Gaussian Mixture Model likelihood (audio). All scores are min-max normalized for cross-modal comparability.
- Orchestration Layer: An Orchestrator Agent (Gemini 1.5 Pro via LangChain) fuses the outputs using scaled dot-product attention with learnable projection matrices , following:
The orchestration logic is formalized as a partially observable Markov decision process (POMDP), enabling dynamic exploration vs. exploitation as threat hypotheses are evaluated.
- Response Layer: If (), a Responder Agent crafts a Gemini prompt combining weighted explanations and attention weights to induce a threat hypothesis , then selects a remediation action via a Q-learning policy . Remediation actions include invoking Snort API blocks, AWS IAM suspensions, and OPA policy updates, with all state transitions logged for auditability and rollback.
2. Algorithmic Framework and Mathematical Models
Cy-Agent's workflow is governed by distributed perception, attention-based fusion, GenAI reasoning, and adaptive response selection. This is encapsulated in Algorithm 1 and the associated mathematical models:
- Distributed Perception: Each AnalyzeAgent in the LangChain pipeline invokes LLM or local ML to produce per modality.
- Attention-based Fusion: Threat scores are projected through learnable matrices and aggregated using scaled dot-product attention to yield .
- Decision Policy: A response is triggered if .
- Q-Learning: The Responder Agent maintains a value function
optimizing over the state space defined by , with actions in .
3. Evaluation Metrics, Datasets, and Experimental Results
The Cy-Agent system is evaluated using standard classification and operational cybersecurity metrics:
| Metric | AgenticCyber | Baselines |
|---|---|---|
| F1-score | 96.2% | 78–85% |
| Latency (ms) | 420 | 800–1,200 |
| Situational Awareness Score (SAS) | 0.92 | Not reported |
| MTTR Reduction | 65% | — |
- Precision
- Recall
- F1-score
- End-to-end Latency, , averages ~420 ms
- Mean Time To Respond (MTTR): Reduced by 65% compared to static baselines
Datasets include 1.9M AWS CloudTrail events (2,000 labeled), UCF-Crime video (1,100 frames), UrbanSound8K audio (300 high-risk clips), and 15,000 synthetic, temporally-aligned attack scenarios. Deployments utilize AWS EC2, A100 GPU, Kafka for stream synchronization (2,000 logs/s, 1,100 frames/s, 300 clips/s), Docker and Kubernetes for agent containerization, and LangChain orchestration (Roy, 6 Dec 2025).
4. Adaptive Response and Security Posture Management
Adaptive response in Cy-Agent is driven by continuous state updates and reinforcement learning–based policy optimization:
- State Representation:
- Reward Function: for correct mitigation, for slow or erroneous responses
- Policy Optimization: Q-learning with
- Posture Updates: Automated via Open Policy Agent (OPA) for policy enforcement, i.e.,
- State Update Rule: , with Bellman updates to the Q-network
A plausible implication is robust, explainable adaptation to evolving threat dynamics, with formal state tracking and audit logs supporting compliance and rollback.
5. Orchestration Infrastructure and Extensibility
The orchestration capability of Cy-Agent is built on LangChain v0.1.0, sequencing LLM invocations and managing agent context. PyTorch implements the attention mechanism, and Gemini SDK provides the GenAI interface. Scalability and reliability are achieved via:
- Kafka: High-throughput data stream ingestion
- Kubernetes: Autonomous scaling of individual agent containers
- Docker: Modular, dependency-isolated agent packaging
- Extensibility: New agents (e.g., Network Agent) can be integrated into LangChain with minimal code changes, facilitating rapid adaptation to new modalities or threat contexts (Roy, 6 Dec 2025).
For edge environments, quantized Gemini variants support deployments in bandwidth-constrained IoT contexts. Federated learning is proposed to address privacy concerns for distributed video and audio telemetry.
6. Integration with Broader Agentic Cyber Defense Paradigms
Cy-Agent extends the agentic cyber defense paradigm by integrating multimodal perception, cross-modal inference, and machine-learned remediation policy—all orchestrated without human-in-the-loop for sub-second threat reaction. In context of the AICA reference architecture (Kott et al., 2018), Cy-Agent exemplifies modularity in perception, planning, learning, and execution under the POMDP framework, but augments traditional approaches with GenAI and LLM-powered reasoning. The attention fusion mechanism enables dynamic weighting of multi-source signals, overcoming limitations of unimodal or static MAS configurations.
Notably, Cy-Agent's natural language explanation at each modality and fused summary supports explainability, enhancing analyst confidence and SOC handover. The system’s SAS metric quantifies end-to-end perception–comprehension–projection, operationalizing Endsley’s situational awareness model.
7. Impact and Future Research Directions
By fusing cloud, video, and audio telemetry via an orchestrated multi-agent architecture, Cy-Agent delivers unified, real-time SOC situational awareness with high detection fidelity and low MTTR. The modular architecture enables scaling across enterprise and IoT deployments, mitigating legacy silos that impede cross-domain observability and policy enforcement. A plausible implication is a shift toward fully autonomous SOCs with demonstrable safety, adaptivity, and auditability.
Future research priorities include integrating federated learning for privacy preservation, expanding coverage to additional telemetry (e.g., network flows, endpoint forensics), enhancing the fusion model’s robustness to adversarial manipulation, and formalizing assurance methods for GenAI-driven response systems (Roy, 6 Dec 2025, Kott et al., 2018).