FIRMHIVE: Agent-Based Firmware Analysis
- FIRMHIVE is a recursive agent-based framework that deploys LLMs as autonomous firmware security analysts.
- It uses a dynamic Tree of Agents (ToA) and a Persistent Knowledge Hub to achieve decentralized, context-isolated analysis of complex firmware images.
- Quantitative results show FIRMHIVE outperforms traditional static tools and other LLM agent baselines in both depth of reasoning and vulnerability discovery.
FIRMHIVE is a recursive, agent-based framework designed to orchestrate LLMs as autonomous firmware security analysts. Addressing the inherent difficulties of firmware—its binary nature, intertwined dependencies, and heterogeneity—FIRMHIVE transforms delegation into a per-agent, executable primitive and introduces a runtime-grown Tree of Agents (ToA) for decentralized, adaptive analysis coordination. This structure enables deep cross-file reasoning, broad inspection coverage, and robust dependency tracking in complex firmware images, surpassing both traditional static tools and existing LLM agent baselines in vulnerability discovery and analysis yield (Zhang et al., 23 Nov 2025).
1. System Architecture
FIRMHIVE is composed of three tightly integrated modules: the Recursive Delegation Engine (RDE), the Persistent Knowledge Hub (PKH), and the dynamic Tree of Agents (ToA).
- Root Agent: Receives as input a firmware image and user prompt , launching the overall analysis by invoking the RDE.
- Recursive Delegation Engine (RDE): Implements delegation as an explicit, agent-level device, providing each agent with a
DelegationTool(single subtask) andParallelDelegationTool(multiple subtasks). Subtask descriptors (object, goal) are serialized and assigned to child agents, each with their own LLM context. - Persistent Knowledge Hub (PKH): Serves as a global, durable data store with APIs for storing, querying, and exploring structured findings. This global context enables agents to write verified alerts and to retrieve dependency information across different branches.
- Tree of Agents (ToA): Each delegation results in a parent-child edge in a directed, rooted tree, where each node (agent) is context-isolated. The ToA dynamically expands in breadth or depth depending on the analysis needs and firmware structure.
Context isolation, parent-child message boundaries, bounded agent types (Directory, File, Function), and explicit termination enforce the tree's invariants, facilitating reproducibility and eliminating orchestrator bottlenecks.
2. Delegation and Tree Construction
Delegation is a first-class, executable primitive available to every agent. Each agent may recursively decompose its task into subtasks based on its observations—directories into files, files into binaries, binaries into functions—spawning specialized child agents as needed. The process follows the algorithmic template:
$\begin{algorithmic}[1] \Function{GrowToA}{Agent\,n} \State (o,g)\gets n.\text{task} \State \mathcal{O}\gets \text{Observe}(o) \If{\Call{NeedDecomposition}{n,\mathcal{O},g}} \State \{(o_i,g_i)\}_{i=1}^k \gets \Call{LLMDecompose}{n, \mathcal{O},g} \For{i=1 \ldots k \textbf{in parallel}} \State c_i\gets\Call{SpawnAgent}{(o_i,g_i)} \State \Call{GrowToA}{c_i} \EndFor \State n.\text{result}\gets\Call{AggregateResults}{c_1,\dots,c_k} \Else \State n.\text{result}\gets\Call{InspectComponent}{n,\mathcal{O},g} \EndIf \State \Return n.\text{result} \EndFunction \end{algorithmic}$
Delegation mechanisms guarantee context isolation—only the subtask tuple is exchanged—mitigating interference and recursive complexity. Each node in the ToA can serve as a new root for further delegation.
3. Agent Types, Exploration, and Knowledge Integration
FIRMHIVE implements three agent types: Directory, File, and Function agents.
- Directory agents spawn child agents for each subdirectory or file, maximizing parallelization.
- File agents further differentiate between scripts and binaries; for binaries, they decompose into function agents associated with binary entry points.
- Function agents perform intra-binary analysis, such as taint tracing along the call graph, recursively delegating further as dictated by the control flow.
Results produced by any agent are published into the PKH, enabling structured and queryable cross-file or cross-component dependencies. For example, a configuration anomaly found by a file agent can be integrated with a code path revealed by a function agent via PKH-mediated queries. Agents never directly communicate outside their subtree, ensuring coordination remains stable and reproducible.
4. Quantitative Evaluation
FIRMHIVE was benchmarked against both leading static tools (SaTC and Mango) and agent-based LLM systems (SWE-Agent, MAS pipelines) using DeepSeek-v3.1 and GPT-4o (temperature 0) on the Karonte firmware corpus (49 images; up to 34,000 files each).
Key security analysis tasks evaluated included:
- T1: Hard-coded credential detection
- T2: SBOM/CVE mapping
- T3: NVRAM/env-var taint tracing
- T4: Web attack chain (HTTP input to system sink)
- T5: Full exploit chain detection
Quantitative results on T5 (full exploit chain) are summarized below:
| Tool | Verified alerts (T5) |
|---|---|
| SaTC | 644 |
| Mango | 1,109 |
| FIRMHIVE | 1,802 |
Manual sampling (300 alerts) showed a precision of 71%.
Compared to agent baselines across T1–T5:
- Reasoning Depth: FIRMHIVE up to 5,000 steps (T5), 16 that of single-agent.
- Exploration Breadth: 80 files/firmware (T5), 2.3 MAS orchestrator baseline.
- Alert Yield: 36.8 alerts/firmware (T5), 5.6 MAS baseline.
- Token-per-alert: 0.19–0.47M tokens/alert.
These results demonstrate FIRMHIVE's capacity for deeper and broader reasoning, and higher discovery rates, while maintaining precision levels closely approaching those of expert-driven pipelines.
5. Strengths, Limitations, and Open Challenges
Strengths:
- Supports sustained, long-horizon reasoning via dynamic ToA expansion, minimizing information loss.
- Adapts the trade-off between exploration breadth and depth at runtime, adjusting to firmware structure.
- Enables cross-file and cross-component dependency tracking through PKH global state.
- Operates without hard-coded firmware rules, generalizing across a range of tasks.
Limitations:
- Incurs higher wall-clock latency and token consumption (2–3 that of single-agent approaches).
- Remains subject to LLM hallucination risk, although mitigated by an "evidence-first" operational rule.
- Yields incomplete coverage in cases where tools or step/time budgets are exhausted.
Open Challenges:
- Integration of dynamic emulation (e.g., QEMU-based execution) into the RDE.
- Modeling hardware interrupts in agent workflows (AIM-style approaches).
- Generalization of the ToA paradigm to domains beyond firmware, such as DRAM or GPU kernel binaries.
- Joint fine-tuning of LLMs on firmware schemas to further enhance analysis stability and precision.
FIRMHIVE exemplifies how elevating delegation to a first-class primitive and embedding a global knowledge store can orchestrate LLM-based agents into an adaptive, context-isolated analysis framework that achieves depth, breadth, and reproducibility in large-scale firmware security workflows (Zhang et al., 23 Nov 2025).