Papers
Topics
Authors
Recent
2000 character limit reached

FIRMHIVE: Agent-Based Firmware Analysis

Updated 30 November 2025
  • FIRMHIVE is a recursive agent-based framework that deploys LLMs as autonomous firmware security analysts.
  • It uses a dynamic Tree of Agents (ToA) and a Persistent Knowledge Hub to achieve decentralized, context-isolated analysis of complex firmware images.
  • Quantitative results show FIRMHIVE outperforms traditional static tools and other LLM agent baselines in both depth of reasoning and vulnerability discovery.

FIRMHIVE is a recursive, agent-based framework designed to orchestrate LLMs as autonomous firmware security analysts. Addressing the inherent difficulties of firmware—its binary nature, intertwined dependencies, and heterogeneity—FIRMHIVE transforms delegation into a per-agent, executable primitive and introduces a runtime-grown Tree of Agents (ToA) for decentralized, adaptive analysis coordination. This structure enables deep cross-file reasoning, broad inspection coverage, and robust dependency tracking in complex firmware images, surpassing both traditional static tools and existing LLM agent baselines in vulnerability discovery and analysis yield (Zhang et al., 23 Nov 2025).

1. System Architecture

FIRMHIVE is composed of three tightly integrated modules: the Recursive Delegation Engine (RDE), the Persistent Knowledge Hub (PKH), and the dynamic Tree of Agents (ToA).

  • Root Agent: Receives as input a firmware image ff and user prompt uu, launching the overall analysis by invoking the RDE.
  • Recursive Delegation Engine (RDE): Implements delegation as an explicit, agent-level device, providing each agent with a DelegationTool (single subtask) and ParallelDelegationTool (multiple subtasks). Subtask descriptors (o,g)(o, g) (object, goal) are serialized and assigned to child agents, each with their own LLM context.
  • Persistent Knowledge Hub (PKH): Serves as a global, durable data store with APIs for storing, querying, and exploring structured findings. This global context enables agents to write verified alerts and to retrieve dependency information across different branches.
  • Tree of Agents (ToA): Each delegation results in a parent-child edge in a directed, rooted tree, where each node (agent) is context-isolated. The ToA dynamically expands in breadth or depth depending on the analysis needs and firmware structure.

Context isolation, parent-child message boundaries, bounded agent types (Directory, File, Function), and explicit termination enforce the tree's invariants, facilitating reproducibility and eliminating orchestrator bottlenecks.

2. Delegation and Tree Construction

Delegation is a first-class, executable primitive available to every agent. Each agent may recursively decompose its task into subtasks based on its observations—directories into files, files into binaries, binaries into functions—spawning specialized child agents as needed. The process follows the algorithmic template:

$\begin{algorithmic}[1] \Function{GrowToA}{Agent\,n} \State (o,g)\gets n.\text{task} \State \mathcal{O}\gets \text{Observe}(o) \If{\Call{NeedDecomposition}{n,\mathcal{O},g}} \State \{(o_i,g_i)\}_{i=1}^k \gets \Call{LLMDecompose}{n, \mathcal{O},g} \For{i=1 \ldots k \textbf{in parallel}} \State c_i\gets\Call{SpawnAgent}{(o_i,g_i)} \State \Call{GrowToA}{c_i} \EndFor \State n.\text{result}\gets\Call{AggregateResults}{c_1,\dots,c_k} \Else \State n.\text{result}\gets\Call{InspectComponent}{n,\mathcal{O},g} \EndIf \State \Return n.\text{result} \EndFunction \end{algorithmic}$

Delegation mechanisms guarantee context isolation—only the subtask tuple is exchanged—mitigating interference and recursive complexity. Each node in the ToA can serve as a new root for further delegation.

3. Agent Types, Exploration, and Knowledge Integration

FIRMHIVE implements three agent types: Directory, File, and Function agents.

  • Directory agents spawn child agents for each subdirectory or file, maximizing parallelization.
  • File agents further differentiate between scripts and binaries; for binaries, they decompose into function agents associated with binary entry points.
  • Function agents perform intra-binary analysis, such as taint tracing along the call graph, recursively delegating further as dictated by the control flow.

Results produced by any agent are published into the PKH, enabling structured and queryable cross-file or cross-component dependencies. For example, a configuration anomaly found by a file agent can be integrated with a code path revealed by a function agent via PKH-mediated queries. Agents never directly communicate outside their subtree, ensuring coordination remains stable and reproducible.

4. Quantitative Evaluation

FIRMHIVE was benchmarked against both leading static tools (SaTC and Mango) and agent-based LLM systems (SWE-Agent, MAS pipelines) using DeepSeek-v3.1 and GPT-4o (temperature 0) on the Karonte firmware corpus (49 images; up to 34,000 files each).

Key security analysis tasks evaluated included:

  • T1: Hard-coded credential detection
  • T2: SBOM/CVE mapping
  • T3: NVRAM/env-var taint tracing
  • T4: Web attack chain (HTTP input to system sink)
  • T5: Full exploit chain detection

Quantitative results on T5 (full exploit chain) are summarized below:

Tool Verified alerts (T5)
SaTC 644
Mango 1,109
FIRMHIVE 1,802

Manual sampling (300 alerts) showed a precision of 71%.

Compared to agent baselines across T1–T5:

  • Reasoning Depth: FIRMHIVE up to 5,000 steps (T5), ∼\sim16×\times that of single-agent.
  • Exploration Breadth: ∼\sim80 files/firmware (T5), ∼\sim2.3×\times MAS orchestrator baseline.
  • Alert Yield: ∼\sim36.8 alerts/firmware (T5), ∼\sim5.6×\times MAS baseline.
  • Token-per-alert: 0.19–0.47M tokens/alert.

These results demonstrate FIRMHIVE's capacity for deeper and broader reasoning, and higher discovery rates, while maintaining precision levels closely approaching those of expert-driven pipelines.

5. Strengths, Limitations, and Open Challenges

Strengths:

  • Supports sustained, long-horizon reasoning via dynamic ToA expansion, minimizing information loss.
  • Adapts the trade-off between exploration breadth and depth at runtime, adjusting to firmware structure.
  • Enables cross-file and cross-component dependency tracking through PKH global state.
  • Operates without hard-coded firmware rules, generalizing across a range of tasks.

Limitations:

  • Incurs higher wall-clock latency and token consumption (2–3×\times that of single-agent approaches).
  • Remains subject to LLM hallucination risk, although mitigated by an "evidence-first" operational rule.
  • Yields incomplete coverage in cases where tools or step/time budgets are exhausted.

Open Challenges:

  • Integration of dynamic emulation (e.g., QEMU-based execution) into the RDE.
  • Modeling hardware interrupts in agent workflows (AIM-style approaches).
  • Generalization of the ToA paradigm to domains beyond firmware, such as DRAM or GPU kernel binaries.
  • Joint fine-tuning of LLMs on firmware schemas to further enhance analysis stability and precision.

FIRMHIVE exemplifies how elevating delegation to a first-class primitive and embedding a global knowledge store can orchestrate LLM-based agents into an adaptive, context-isolated analysis framework that achieves depth, breadth, and reproducibility in large-scale firmware security workflows (Zhang et al., 23 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to FIRMHIVE.