ReF Decompile: High-Fidelity Binary Decompilation

Updated 20 March 2026

ReF Decompile is an end-to-end decompilation framework that uses Relabeling and Function Call strategies to preserve control-flow and accurately recover literal data.
It employs symbolic relabeling to replace concrete addresses with abstract labels, ensuring precise reconstruction of the binary's control-flow graph.
Empirical evaluations show that ReF Decompile improves re-executability by over 8 percentage points compared to prior LLM-based decompilers, demonstrating its practical impact.

ReF Decompile is an end-to-end decompilation framework designed to convert binary executables into high-level language code while maintaining control-flow fidelity and correctly recovering literal data such as strings and floating-point values. By addressing critical information loss inherent in prior LLM-based decompilers, ReF Decompile achieves state-of-the-art re-executability on standard benchmarks, reinforcing its utility in reverse engineering, vulnerability discovery, malware analysis, and legacy software migration (Feng et al., 17 Feb 2025).

1. Core Innovations: Relabeling and Function Call Strategies

ReF Decompile fundamentally augments LLM-based decompilation by introducing two key innovations—the Relabeling strategy and the Function Call strategy.

Relabeling Strategy:

Raw disassemblies from tools like objdump or Capstone provide instructions annotated with concrete addresses. Such addresses often encode jump targets (branch destinations) and static data references, but are contextually opaque to machine learning models. The Relabeling step abstracts away all concrete addresses, replacing:

Jump targets with symbolic labels ( $\phi_J$ : $J \to L$ , with $L=\{L_1, L_2, \ldots\}$ ), ensuring that every control-flow edge in the original binary is preserved exactly in the symbolic assembly.
Data addresses in memory loads/stores with symbolic data labels ( $\phi_D$ : $D \to M$ , with $M=\{D_1, D_2, \ldots\}$ ), enabling later recovery of concrete literals.

The result is a sequence $S'$ where explicit “ $L_i$ ” and “ $D_j$ ” replace all address references, dramatically improving reproducibility of the control-flow graph (CFG). Just before the instruction at the original address $a$ , a directive “ $L_i$ : $” is inserted, which preserves precise CFG structure in the relabeled assembly. Function Call Strategy: Absent address data in the prompt, LLMs cannot recover accurate floating-point, integer, or string literals from .rodata or other static segments. ReF Decompile embeds structured tool-call requests in the decoding process. When the model requires the value corresponding to a label$ D_i $and infers a type$ \tau $, it emits:$

1	TOOL_CALL(label=D_i, type=τ)

An external runtime intercepts this request, looks up

, i t e mi t s :!!!! 0!!!! A n e x t er na l r u n t im e in t erce pt s t hi sre q u es t, l oo k s u p

D_i

using

\phi_D

, reads the appropriate raw bytes, decodes them according to

\tau

, and returns

S = \langle s_1, s_2, \ldots, s_n \rangle

of raw instructions, each with address

a_i

.</li> <li><strong>Relabeling (

):</strong> Apply the mappings

\phi_J

and

\phi_D

to replace target addresses and static data references in

, resulting in

with symbolic labels. For any

s_i

containing a jump or branch to

a \in J

, replace with

jmp\, \phi_J(a)

, and, for

a \in D

, replace with

\phi_D(a)(\%rip)

.</li> <li><strong>Prompt Construction and LLM Decoding:</strong> Present

to the LLM. When the model needs the value for a

D_j

label, it issues TOOL_CALL, receives VALUE(

D_j

)

into the generation context.</li> <li><strong>Source Emission:</strong> The LLM produces the final high-level C code, fully annotated with the correctly resolved control flow and data.</li> </ol> <p>These transformations are designed to be invertible; given

and the mapping tables, it is possible to reconstruct the original binary structure precisely (<a href="/papers/2502.12221" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Feng et al., 17 Feb 2025</a>).</p> <h2 class='paper-heading' id='empirical-evaluation-and-sota-performance'>3. Empirical Evaluation and SOTA Performance</h2> <p>ReF Decompile was validated on the <a href="https://www.emergentmind.com/topics/decompile-eval" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Decompile-Eval</a> (adapted HumanEval) benchmark, encompassing 164 C tasks across four optimization levels (O0–O3), using a 6.7B-parameter LLM4Decompile-End backbone fine-tuned via <a href="https://www.emergentmind.com/topics/parameter-efficient-adaptation-lora" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">LoRA</a> (rank 32,

\alpha=64$). Results—averaged over all compiler optimizations—demonstrate:

Approach	Re-exec (%)	Readability
Ghidra (rule-based)	20.12	2.57
GPT-4o (refine)	35.22	2.44
LLM4Decompile-Ref (6.7B)	52.74	3.50
LLM4Decompile-End (6.7B)	48.02	3.54
FAE Decompile (6.7B)	51.07	3.51
ReF Decompile (6.7B)	61.43	3.69

ReF Decompile delivers a +8.69 percentage point improvement over the next best refinement-based LLM system and a substantial readability increment. All experiments use the re-executability metric: the percentage of decompiled outputs that not only recompile, but also pass the original functional test suite (Feng et al., 17 Feb 2025).

Ablation confirms both Relabeling and Function Call components contribute, individually and synergistically:

Relabeling only: +3pp in re-exec.
Function Call only: +7pp in re-exec.
Combined: +8.08pp in re-exec, +0.19 in readability.

4. Comparison with Prior Techniques and Complementarity

Early LLM-based decompilers (e.g., LLM4Decompile-End/Ref, FAE Decompile) omit sources of information preserved by ReF Decompile—namely, symbolic CFG edge and data segment recovery—thus exhibiting lower re-executability and more hallucinated code structure (Tan et al., 2024). Refinement-focused methods that leverage Ghidra output as additional context can partially improve execution correctness, but cannot systematically recover lost literal values or enforce control-flow integrity (Feng et al., 17 Feb 2025, Tan et al., 2024).

Rule-based commercial decompilers (Ghidra, Hex-Rays) generate correct high-level structures in simple cases, but fail both at type/literal recovery and under code obfuscation, and typically achieve lower than 25% re-executability in controlled benchmarks (Feng et al., 17 Feb 2025).

5. Design Philosophy and Theoretical Guarantees

The architecture of ReF Decompile aligns with principles articulated in the literature on transparent transformations (Arranz-Olmos et al., 7 Jan 2025). While not constructed using transparency-preserving simulations or explicit ReF-transformer machinery, its staged pipeline—especially through Relabeling—preserves all control-flow information, supporting subsequent formal analysis. The explicit, invertible mapping from binary addresses to symbolic labels serves as a form of semantic round-trip guarantee: source-level CFG is a lossless abstraction of the binary CFG; literal recovery via Function Call is sound provided the underlying lookup is correct.

A plausible implication is that the ReF Decompile framework could be further extended or formally analyzed for "ReF-transparency" in the sense of (Arranz-Olmos et al., 7 Jan 2025), although such formal properties are not claimed in the original source.

6. Limitations, Integration, and Future Directions

ReF Decompile is evaluated principally on disassembled C functions with an emphasis on executable recovery using standard benchmarks. Its reliance on the LLM's ability to accurately infer the type for each data label (τ) introduces error modes if type inference fails or assembly is highly obfuscated. The current toolchain assumes availability of the raw binary and correct symbol-to-data region mapping, conditions which may not always apply in heavily packed or encrypted binaries.

Integration with other advanced pipelines is facilitated by the modularity of the relabeling and function call preprocessing, making it complementary to more complex refinement strategies and post-decompilation repair frameworks (such as D-LiFT (Zou et al., 11 Jun 2025), FidelityGPT (Zhou et al., 22 Oct 2025), or hybrid LLM-symbolic tools (Wong et al., 2023)). In contexts where end-to-end decompilation is infeasible due to severe obfuscation or non-x86 architectures, incorporating formal abstraction of control flow (e.g., via SALT trees (Wang et al., 18 Sep 2025)) or transparency-preserving transformations may increase robustness.

Potential future research directions include:

Extension to multi-architecture binaries and obfuscated control-flows.
Theoretical validation under formal semantic frameworks for transparent decompilation.
Automatic downstream integration with static analyzers, symbolic execution backends, or security policy checkers.

7. Summary

ReF Decompile demonstrates that two lightweight preprocessing steps—Relabeling for control-flow integrity and Function Call for literal/constant recovery—substantially advance the state of end-to-end LLM-based decompilation. Its empirical superiority in both re-executability and human readability marks it as a foundational advancement, enabling higher-fidelity binary analysis workflows with minimal manual effort or reliance on fragile heuristic rules (Feng et al., 17 Feb 2025).