SALT: Source-level Abstract Logic Tree
- SALT is a rooted tree structure that represents high-level control-flow and logical invariants by mapping functions, loops, blocks, and instructions.
- The algorithm constructs SALT by extracting CFGs, normalizing instructions, detecting loops, and applying SMT-based invariant synthesis.
- Empirical evaluations show that SALT improves decompilation metrics and invariant generation, ensuring robust performance even under obfuscation.
A Source-level Abstract Logic Tree (SALT) is a hierarchical, rooted tree structure used to represent the high-level control-flow and logical invariants of a program or binary function. Each node in a SALT corresponds to a logic block such as the entire function, loops, straight-line code blocks, or individual instructions, and is labeled by function, loop, block, or instruction type. In recent research, SALT has been formalized and employed to bridge the gap between low-level structures (such as assembly and control-flow graphs) and corresponding source-level semantics, supporting advanced reasoning in binary decompilation (Wang et al., 18 Sep 2025) and invariant inference (Garoche et al., 2012).
1. Formal Structure and Definition
SALT is defined as a labeled rooted tree
where:
- is a set of logic-block nodes, each corresponding to a structural program region (function, loop, straight-line block).
- encodes parent–child (tree) relationships.
- maps nodes to their logic type, with .
Given a disassembled function producing normalized assembly , a control-flow graph is extracted, with basic blocks and edge set. The mapping assigns instructions and detected loop structures to tree nodes, ordering instructions within each node according to control flow. Each node may carry additional semantic annotations and invariants in the context of formal verification.
2. Algorithmic Construction of SALT
SALT construction begins with extracting the control-flow graph from assembly, followed by normalization of instructions—rewriting absolute jumps to relative offsets and annotating data references with extracted constants or strings. Loop detection is performed by identifying strongly connected components in the CFG via back-edge analysis. Nesting relationships are determined by set inclusion among these components, allowing the identification of nested loops.
The logic tree is then constructed recursively:
- The root node represents the entire function.
- For each unprocessed entry block associated with a loop, a loop node is instantiated, and recursive construction is invoked on subloops and loop exit blocks.
- Straight-line code blocks are appended as child nodes under the appropriate parent.
- Calls are handled by merging callee successor instructions into the calling block when detected.
This yields a hierarchical, source-level abstraction that captures all structural control-flow units and their relationships.
3. Integration with Logic-Based Invariant Generation
In formal verification contexts, as exemplified in (Garoche et al., 2012), each SALT node may be associated with a logical formula 0 encoding the set of states reaching that node, and an abstract element 1 within a numerical domain 2 (e.g., intervals, octagons). The abstraction and concretization pair 3 form a Galois connection:
- 4 returns the set of possible program states described by 5,
- 6 computes the least abstract element covering 7.
For each CFG node, a corresponding transition formula is synthesized using a mapping 8 from source statements to logic. For example, assignment, conditional, sequence, and loop statements are converted into quantifier-free (or fixpoint-defined for loops) formulas within a decidable logic.
Automatic invariant generation is realized by applying a solver-guided fixpoint engine:
- An abstract transfer function 9 generalizes program semantics using witness search via SMT queries.
- At each step, inductive invariants are detected and attached to the corresponding SALT node, enabling on-the-fly reporting.
- Delayed widening, memoization of SMT queries, and subdomain partitioning are practical optimizations to ensure precision and scalability.
4. Application to LLM-Guided Binary Decompilation
In binary decompilation, SALT is central to the SALT4Decompile pipeline (Wang et al., 18 Sep 2025). The process is as follows:
- SALT Construction: Assembly is parsed, normalized, and converted into the SALT structure, capturing loops and basic blocks hierarchically.
- SALT-Based LLM Prompting: The serialized SALT is used as a prompt for a fine-tuned LLM, which outputs first-pass decompiled code.
- Post-Processing: The output is refined via automated syntactic error correction (CEF), boundary and off-by-one pattern fixing (BEF), and LLM-guided variable renaming and documentation (symbol recovery).
This pipeline leverages the semantic depth of SALT to bridge the abstraction gap between binary instructions and source code logic, demonstrating robust performance even under various obfuscation techniques.
5. Empirical Evaluation and Comparative Results
Decompilation experiments using SALT4Decompile show improved metrics relative to both commercial and research baselines. On the Decompile-Eval benchmark (656 binaries, four optimization levels), SALT-based LLM decompilation yields:
- Re-compilation rate: 0 (1 over baseline)
- Re-execution rate: 2 (3)
- Test-case pass rate (TCP): 4 (5)
SALT-driven methods maintain high robustness under obfuscations such as BogusCF, Flattening, Substitution, and Split, with a 6 average advantage in recompilation rate and 7 in TCP. Ablation studies confirm that replacing SALT with linearized representations or basic CFGs results in 8–9 lower TCP, indicating that the hierarchical, logic-rich structure of SALT is critical for semantic recovery (Wang et al., 18 Sep 2025).
6. Use in Formal Verification and On-the-Fly Invariants
In the context of invariant generation, SALT provides a program-point–oriented structure for streaming invariants, supporting:
- Per-node fixpoint computation via abstract interpretation.
- Immediate emission of 1-inductive invariants at every iteration, available both to human users and downstream tools.
- Structural partitioning of fixpoint tasks according to control-flow and branching, improving parallelism and analytical tractability.
Streaming APIs or program-querying tools can extract all invariants witnessed at any program point (SALT node) or subscribe for online updates, supporting k-induction and verifying properties before full fixpoint convergence (Garoche et al., 2012).
7. Significance and Plausible Implications
SALT resolves the mismatch between the low-level, sequence-oriented representation of binaries and the structured, logic-rich abstractions appropriate for high-level program reasoning. In decompilation, this suggests a path toward more reliable and semantically complete recovery of source code, transparent enough to enable automated refinement and user-guided debugging. In formal verification, organizing the invariant synthesis and propagation around the SALT structure increases the relevance and usability of generated invariants, as each invariant is tightly bound to a clear, source-level program location. A plausible implication is that future program analysis tools may increasingly converge on SALT-like program abstractions as an interoperability layer between verification, synthesis, and reverse engineering workflows.