Interactive LLM Honeypot Architecture

Updated 4 December 2025

Interactive LLM-honeypot architecture is a system that employs dynamic, LLM-driven dialogue to realistically simulate services and trap adversaries.
It integrates modular components such as API gateways, session managers, probe insertion, and response filters to ensure robust detection and engagement.
Empirical evaluations indicate high defense efficacy with minimal latency, making it a practical tool for real-time cyber threat intelligence.

An interactive LLM-honeypot deployment architecture leverages LLMs to detect, engage, and analyze adversarial activity through realistic, dynamic, and adaptive dialog. Unlike traditional, deterministic honeypots, LLM-based honeypots proactively simulate genuine services, protocols, or conversational behaviors, allowing for both active threat intelligence gathering and robust defense, including the detection of advanced multi-turn adversarial techniques such as LLM jailbreaks. Architectures in this domain are modular, integrating LLM-driven response engines, virtualized protocol or service emulation, multiple deception layers, security analytics, and real-time feedback loops for both engagement optimization and attacker identification (Wu et al., 16 Oct 2025, Adebimpe et al., 24 Oct 2025, Malhotra, 1 Sep 2025, Bridges et al., 29 Oct 2025).

1. Core System Components and Data Flows

Interactive LLM-honeypots are composed of several key modules, structured to ensure realism, operational resilience, and security.

API Gateway/Edge Proxy: Acts as the ingress point, performing authentication, session routing, load balancing, and rate limiting. Protects backend resources from volumetric attacks and blocks known automated scanners (Wu et al., 16 Oct 2025, Bridges et al., 29 Oct 2025).
Session Manager & State Tracker: Maintains per-session state, including virtualized filesystems, conversational context, environment variables, and interaction history. Session isolation is enforced (typically via containerization or micro-VMs) for segmentation and security (Sladić et al., 2023, Bridges et al., 29 Oct 2025).
LLM Interaction Module: Implements prompt assembly, persona/context injection, and invokes one or more LLMs (fine-tuned or API-based) for response generation. Handles session-based history pruning and prompt window management. Incorporates fallback and error-handling logic to address model failures or anomalies (Otal et al., 12 Sep 2024, Wang et al., 4 Jun 2024).
Probe Insertion / Bait Model: For proactive adversary engagement, this module generates contextually appropriate decoy questions or statements to provoke further malicious intent, using an LLM fine-tuned for maximum attractiveness but minimal actionability (Wu et al., 16 Oct 2025).
Response Filter & Output Sanitizer: Post-processes all LLM outputs, enforcing security policies (e.g., removing actionable/incriminating content), normalizing out-of-domain outputs, and filtering prompt injection attempts (Wu et al., 16 Oct 2025, Sladić et al., 2023).
Protocol and Service Emulators: Deterministically handle simple commands and simulate standard protocol responses for speed and realism. LLM responses are invoked for novel, compound, or contextually complex requests (Malhotra, 1 Sep 2025, Adebimpe et al., 24 Oct 2025).
Logging, Analysis & Threat Intel Connector: Captures all session transcripts, interaction metadata, model decisions, and timing. Real-time and batch analysis are used for classification (benign/suspicious), metrics computation, and export to threat intelligence platforms (e.g., STIX/TAXII, SIEM) (Bridges et al., 29 Oct 2025).
Orchestration & Control Plane: Automates deployment, scaling, rolling updates, A/B testing, and resource allocation. Usually realized via Kubernetes with Helm charts and CI/CD integration (Sladić et al., 2023, Bridges et al., 29 Oct 2025).

High-Level Data Flow

User → [API Gateway] → [Session Manager] → [LLM Engine | Deterministic Emulator | Probe Module]
     → [Response Filter] → [User]
     → [Logging & Analysis]
     ↔ [Threat Intelligence, External SOC]

In advanced guardrail applications, each user input traverses primary LLM, Probe Insertion, bait model, filter, and then is logged for subsequent multi-turn behavior analysis (Wu et al., 16 Oct 2025).

2. Probing, Bait Modeling, and Multi-Turn Adversary Detection

To expose adaptive attackers and iterative jailbreaks, the architecture integrates a proactive multi-turn probing subsystem:

Probe Insertion: A classifier (e.g., compact Transformer) dynamically decides whether to append a decoy (bait) to each LLM response based on session risk score, interaction history, and adversary behavioral cues (Wu et al., 16 Oct 2025).
Bait Model: Trained via cross-entropy with regularization to minimize actionability (F-score) and maximize attractiveness (A-score), the bait model yields non-actionable but highly engaging prompts. The optimization function is:

$L(\theta) = -\sum_{t=1}^T \log P_\theta(D^*_t | D^*_{<t}, Q) + \lambda \hat{F}(\theta; Q) - \mu \hat{A}(\theta; Q)$

where $\hat{F}$ and $\hat{A}$ are estimated via auxiliary LLMs or reward models, and $\lambda, \mu > 0$ are balancing hyperparameters (Wu et al., 16 Oct 2025).

Interaction Protocols: Each user turn triggers primary LLM response, optional bait insertion, and risk scoring. Multi-turn protocol confirms malicious intent by escalating probes and observing explicit engagement with decoys. Early blocking/quarantine is enforced if the risk score exceeds a threshold (Wu et al., 16 Oct 2025).
Automated Logging & Classification: All turns, risk transitions, and probe engagements are logged and labeled (benign/suspected jailbreak) by an LLM or classifier, supporting downstream analyst review (Wu et al., 16 Oct 2025).

3. Metrics and Analysis of Engagement, Fidelity, and Safety

Evaluation and operational tuning rely on rigorous metrics:

Defense Efficacy Rate (DER):

$\mathrm{DER} = \frac{TP + TN}{N}$

with $TP$ the true positives (blocked jailbreaks), $TN$ the true negatives (correctly allowed benign sessions), and $N$ total cases (Wu et al., 16 Oct 2025).

Honeypot Utility Score (HUS):

$\mathrm{HUS} = \frac{2AF}{A+F}$

where $A$ is the attractiveness score (lower is better, higher lure), $F$ is the feasibility score (actionability, lower is safer). Sweet-spot decoys have low $A$ and $F$ (Wu et al., 16 Oct 2025).

Additional Metrics:
- Protocol fidelity (Levenshtein distance, SBERT/BERTScore, Cosine similarity) (Adebimpe et al., 24 Oct 2025, Otal et al., 12 Sep 2024, Malhotra, 1 Sep 2025).
- Latency per turn (ms distribution), broken down by model selection (e.g., RAG adds 300–500 ms, bait/guardrail ~50–80 ms) (Wu et al., 16 Oct 2025, Adebimpe et al., 24 Oct 2025).
- Session coverage, command accuracy, adaptability, and user satisfaction in empirical deployments (Bridges et al., 29 Oct 2025, Wang et al., 4 Jun 2024).
- Deception/Detection rates for human attackers (e.g., ≈30% fooled on first exposure in controlled studies) (Sladić et al., 8 Oct 2025).
Sample Results: For bait-guardrail on MHJ dataset and GPT-4o, DER = 98.05%, HUS ≈ 0.0785, negligible impact on benign user experience (Wu et al., 16 Oct 2025).

4. Architectural Variations and Deployment Strategies

Architectural blueprints adapt to deployment scale, threat model, and target services:

Service Simulation: Modular frontends expose multiple protocols (SSH, MySQL, POP3, HTTP, LDAP, ICS) with either static emulation for simple commands or LLM-generated dynamics for complex, unknown, or stateful interactions (Sladić et al., 8 Oct 2025, Jiménez-Román et al., 20 Sep 2025, Vasilatos et al., 9 May 2024).
Enhanced Realism and Fingerprint Resistance: Architectures incorporate per-session persona rotation, variable OS/file fingerprints, deterministic delays, and network posture manipulation to resist fingerprinting and detection (Bridges et al., 29 Oct 2025, Sladić et al., 2023).
Resource Management: Orchestration via Docker/Kubernetes for horizontal scaling, special considerations for GPU memory assignment per LLM size, and policies for micro-service decomposition (inference, protocol emulation, logging) (Sladić et al., 2023, Otal et al., 12 Sep 2024).
Integration as Guardrail Middleware: The proactive honeypot guardrail system can function as a middleware layer between public-facing APIs and backend LLMs, ensuring all user queries are filtered, proactively probed, and securely logged without exposing core LLM vulnerabilities directly (Wu et al., 16 Oct 2025).
Threat Intelligence and Forensics: Real-time analytics pipelines connect to SIEM/SOC, supporting dynamic threat feed ingestion, ATT&CK mapping, and periodic re-tuning of LLMs on freshly harvested adversarial data (Bridges et al., 29 Oct 2025).

Examples of architectural blueprints and protocol support:

Honeypot System	Core LLM Component	Protocols/Services
Bait Guardrail	Bait+Protected LLM	Any chat API
SBASH	Prompt-tuned/RAG LLM	Linux shell (SSH)
VelLMes	Persona-prompted LLM	SSH, MySQL, POP3, HTTP
LLMPot	Fine-tuned ByT5	ICS/OT protocols
LDAP Honeypot	Fine-tuned LLaMA	LDAP (ASN.1/BER)
LLMHoney	Multi-model LLMs	SSH/Linux shell

5. Security, Isolation, and Failure Modes

Security is paramount, as LLM-honeypots must defend against both external adversaries and model exploitation:

Sandboxing and Isolation: All command processing occurs in-memory or via emulated virtual systems. Real shell execution is strictly forbidden, and containers or micro-VMs are standard for environment isolation (Sladić et al., 2023, Sladić et al., 8 Oct 2025).
Prompt Reinforcement and Injection Resistance: Persona prompts forbid LLM self-disclosure, with mechanisms to reassert correct persona on detection of prompt-injection or abnormal outputs (Sladić et al., 8 Oct 2025). Response filters actively sanitize outputs, falling back to stub responses on persistent anomalies.
Attack Surface Limitation: Network egress is tightly restricted (only to required LLM APIs), no privileged user escalation is possible, and secrets (API keys) are handled via encrypted KMS or Kubernetes secrets with strict RBAC (Bridges et al., 29 Oct 2025, Wang et al., 4 Jun 2024).
Failure Handling: Overzealous probing may degrade benign user experience, while under-probing can enable attacker evasion. Dynamic tuning of probe frequency, bait model retraining, and continuous security analytics address these trade-offs (Wu et al., 16 Oct 2025).
Audit and Compliance: Comprehensive, encrypted logging supports session replay, anomaly detection, and operator review. Forensic and alerting hooks trigger on detection of policy violations, LLM failures, or suspicious interaction patterns (Sladić et al., 2023, Bridges et al., 29 Oct 2025).

6. Empirical Performance, Evaluation, and Research Directions

LLM-honeypot deployments consistently outperform traditional deterministic honeypots in depth of engagement, adaptability, and detection of complex attack tactics:

Empirical Benchmarks: Evaluations on MHJ, SBert/BERTScore, and protocol-specific datasets show fine-tuned LLM honeypots achieve high protocol compliance, deception rates, and resilience to adversarial adaptation (Wu et al., 16 Oct 2025, Adebimpe et al., 24 Oct 2025, Otal et al., 12 Sep 2024, Vasilatos et al., 9 May 2024).
Latency Impact: Additional architectural steps (e.g., bait probe, RAG-retrieval) are tractable (typically ≤500 ms added turn latency) and can be managed via batching, sharding, and parallelization. Benign user satisfaction consistently remains >95% in self-reported studies (Wu et al., 16 Oct 2025).
Adversarial Identification: Integration of time-based prompt-injection, embedding-based classifiers, and manual review (for ambiguous sessions) enables accurate detection of both human and LLM-powered adversaries (Reworr et al., 17 Oct 2024).
Continual Improvement: Modern architectures advance toward self-improving deception, with closed feedback loops for scenario adaptation, persona rotation, and fine-tuning on emergent attack patterns (Bridges et al., 29 Oct 2025).
Scalability: Session-to-pod ratios, node auto-scaling, regional failover, and distributed LLM pools support tens of thousands of concurrent interactions in production scenarios (Sladić et al., 2023).
Research Gaps: Remaining challenges include reducing model drift, systematic fingerprint resistance, and robust multi-protocol/inter-service state consistency. The systematization literature emphasizes the need for continuous adversarial research on detection vectors, compositional architectures, and open-source standardization (Bridges et al., 29 Oct 2025).

In summary, interactive LLM-honeypot deployment architectures blend fine-tuned LLMs, protocol emulation, proactive multi-turn engagement, and layered security analytics to deliver high-fidelity, adaptive cyber deception and defense. These systems address the evolving threat landscape posed by both human attackers and autonomous AI agents, setting new standards for cyberthreat monitoring, analysis, and proactive mitigation (Wu et al., 16 Oct 2025, Bridges et al., 29 Oct 2025, Adebimpe et al., 24 Oct 2025, Otal et al., 12 Sep 2024).