Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 81 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 432 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

LLM-Based Honeypots

Updated 2 October 2025
  • LLM-based honeypots are deception systems that generate realistic decoy artifacts and system responses to lure and analyze threat actors.
  • They integrate dynamic command emulation, hybrid response pipelines, and fine-tuned prompt engineering to enhance realism and scalability in cybersecurity.
  • Implementation demonstrates high session accuracy and TNR while addressing challenges such as latency, state management, and response consistency.

LLM-based honeypots are deception systems designed to lure, interact with, and analyze threat actors using the generative and contextual capabilities of state-of-the-art LLMs. These honeypots exploit the adaptability, context sensitivity, and interactive realism of LLMs to address the deficiencies of traditional static or deterministic honeypots, with applications ranging from real-time shell or protocol emulation to scalable honeytoken generation, attack classification, and autonomous agent monitoring.

1. Key Principles and Design Patterns

LLM-based honeypots integrate LLMs primarily to (a) dynamically generate system responses and decoy artifacts, (b) enhance attack engagement and telemetry, (c) enable semantic and intent-level attack classification, and (d) automate otherwise labor-intensive deception engineering.

Major Design Elements

Design Aspect Example Implementation Source
Shell emulation Persona-prompted LLM, GPT-3.5 (Sladić et al., 2023)
LDAP honeypot ASN.1/BER–JSON bridging, LoRA tuning (Jiménez-Román et al., 20 Sep 2025)
ICS protocol ByT5 bytes-to-bytes model, PCAP data (Vasilatos et al., 9 May 2024)
Honeytoken gen. Modular prompts, multi-LLM eval (Reti et al., 24 Apr 2024)

2. Methodologies: Integration, Training, and Performance

LLM-based honeypots employ standardized methodologies spanning architectural integration, model training, and protocol- or system-specific customization.

Architecture and Workflow

  • Input Listening and Parsing: Custom servers (SSH, LDAP, industrial TCP/IP) decode and preprocess incoming requests, often transforming to normalized intermediate representations (e.g., JSON).
  • Orchestration and Prompt Chaining: Orchestrator modules manage protocol state, preserve session and semantic context, and compose prompts that guide the LLM.
  • LLM Invocation and Output Validation: Requests are forwarded to fine-tuned LLM instances (often via HTTP API, LangChain, or direct inference), with outputs checked/refined before returning to the attacker.
  • Session Logging: All interactions, including command, response, metadata, and timing information, are logged for threat intelligence and evaluation.

Model Training and Tuning

Evaluation Metrics

3. Application Domains and System Types

LLM-powered honeypots span multiple system types and cybersecurity roles:

Shell-based and Protocol Honeypots

  • Linux Shell Decoys: LLMs simulate interactive Linux shells, engaging attackers with context-aware responses, and demonstrating TNR (True Negative Rate) ≈ 0.90 and overall accuracy ≈ 0.92 (shelLM (Sladić et al., 2023); LLMHoney (Malhotra, 1 Sep 2025)).
  • LDAP Emulation: LLM generates ASN.1/BER-encoded LDAP protocol responses, preserving field-level correctness and connection semantics; post-fine-tuning, weighted validity score reached ≈99% (Jiménez-Román et al., 20 Sep 2025).
  • ICS/SCADA Environments: ByT5-based LLMs reproduce Modbus/S7Comm network behavior and physical process logic at both protocol and functional levels; response validity (RVA) saturates with ≤1600 samples (Vasilatos et al., 9 May 2024).

Honeytoken Generation

  • Scalable, Modular Generation: LLMs, via modular prompt building blocks, produce honeywords, robots.txt, config files, logs, database entries; LLM-honeywords achieved lower trawling attackability (success ≈ 15.15%) than previous heuristics (Reti et al., 24 Apr 2024).
  • Cross-Model Evaluation: Output realism and syntax vary by LLM (GPT-3.5, GPT-4, Gemini, LLaMA-2), with prompt optimality not always transferable between LLMs (Reti et al., 24 Apr 2024).

LLM-Agent and Adversarial Monitoring

  • Agent Detection: Honeypots detect and distinguish LLM-based hacking agents by strategically embedding prompt injections and time-based analysis—scripted to hijack agent goals or steal prompt contents; response times ≤1.5 s are indicative (Reworr et al., 17 Oct 2024).

4. Impact, Benefits, and Comparative Advantages

LLM-based honeypots are characterized by improvements in realism, adaptability, and intelligence collection, with measurable benefits over traditional designs.

Enhanced Realism and Attacker Engagement

Scalability and Versatility

Incident Analysis and Threat Intelligence

  • LLMs can both synthesize attacker narratives and classify interaction types (automated vs. human-driven), providing contextual severity scores and incident summaries (Chacon et al., 2020).
  • Dynamic, longer dialogues enable exposure and analysis of advanced tactics, techniques, and procedures (TTPs) as well as AI-agent attack behaviors (Reworr et al., 17 Oct 2024).

Comparative Limitations and Trade-offs

  • LLM-based honeypots incur non-trivial cost and latency; e.g., LLMHoney’s Gemini-2.0 backend averages ≈3 s per response, and cloud-based deployments estimate ≈US$0.8 per active hour (Sladić et al., 2023, Malhotra, 1 Sep 2025).
  • Occasional hallucinated or inconsistent outputs persist, especially in small LLMs or without careful prompt state management; fallback to cached/dictionary responses and sanitization checks are used as mitigations (Malhotra, 1 Sep 2025).

5. Challenges and Practical Constraints

Deployment and operationalization of LLM-based honeypots present a distinct set of technical and resource challenges:

  • Latency and Compute Overhead: Achieving low-latency (≤3 s) and resource efficiency while maintaining interaction fidelity; large models (>3B params) require more memory and hardware acceleration (Malhotra, 1 Sep 2025).
  • State and Consistency Management: Preventing contradictory or out-of-character responses across extended attacker sessions (addressed by session context saving, prompt updating, and local state management) (Sladić et al., 2023, Malhotra, 1 Sep 2025).
  • Scalability of Realistic Artifacts: Prompt engineering must be robust to prompt injection and brittle generalization across LLMs (Reti et al., 24 Apr 2024, Reworr et al., 17 Oct 2024).
  • Interpretability and Robustness: LLM-based classification/detection outputs often lack direct interpretability; adversarial adaptation and prompt “gaming” are anticipated, necessitating ongoing retraining and defensive hardening (Chacon et al., 2020, Reworr et al., 17 Oct 2024).
  • Security of the Deception Environment: Robust containment is mandatory; LLM must never execute input commands; protocol emulation must avoid accidental exposure of sensitive backends (Malhotra, 1 Sep 2025, Jiménez-Román et al., 20 Sep 2025).

6. Future Directions and Open Research Problems

LLM-based honeypot deployment and sophistication are expected to evolve alongside both attacker capabilities and LLM technology:

  • Long-Term State Modeling: Expansion to large context models (≥16k–32k tokens) or external memory mechanisms to capture protracted attacker engagement (Malhotra, 1 Sep 2025).
  • Automated Output Validation: Integration of secondary discriminators (regex, rule-based, or additional LLMs) to automatically detect and flag hallucinations or incoherence (Malhotra, 1 Sep 2025).
  • Dynamic Signal/Noise Balancing in Prompting: Auto-tuning prompt elements using live evaluation metrics or discriminator feedback loops (Reti et al., 24 Apr 2024).
  • Broader Protocol and Application Coverage: Generalization to further protocols (e.g., SMB, RDP, HTTP/2), multitasking LLM-backed honeypots, and industrial/specialized OT/ICS contexts (Vasilatos et al., 9 May 2024, Jiménez-Román et al., 20 Sep 2025).
  • Adversarial/Agent Behavioral Analysis: Advanced detection and classification of autonomous LLM hacking agents and integration of public dashboards for live threat monitoring (Reworr et al., 17 Oct 2024).
  • Resource and Response Optimization: Model quantization, hardware acceleration, and hybrid LLM/dictionary designs to improve response time for large-scale deployments (Malhotra, 1 Sep 2025, Jiménez-Román et al., 20 Sep 2025).

7. Comparative Evaluation and Practical Implications

Experimental results across representative LLM-powered honeypots indicate superior deception fidelity and threat intelligence compared to rule-based or static decoy systems.

Metric/Aspect LLM Honeypots (SSH, LDAP) Traditional Systems
Realistic Output (TNR) ≈ 0.90 (Sladić et al., 2023, Malhotra, 1 Sep 2025), ≥99% weighted Varies, often <80%
Session Consistency Multi-turn, context-dependent (Sladić et al., 2023) Typically stateless or FSM
Attack Engagement Prolonged, genuine interactions (Sladić et al., 2023) Short/lower attacker dwell
Honeytoken Diversity 7 types, auto-generated, low attacker success Limited, manual or coarse

A plausible implication is that, as LLM cost, latency, and integration maturity improve, these deception techniques will become foundational in early warning, adaptive defense, and threat hunting workflows across diverse cyber operations. As the capabilities and accessibility of LLMs expand, so will their utility and importance within the deception and active defense paradigm.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to LLM-based Honeypots.