HySynth: Hybrid Hyperproperty Synthesis
- HySynth is a hybrid synthesis framework that leverages LLM-derived probabilistic CFG surrogates to guide symbolic program search based on input-output examples.
- It employs HyperTSL logic and reactive synthesis to construct smart contracts that satisfy both trace-based and multi-execution hyperproperty constraints.
- Empirical evaluations demonstrate that HySynth boosts solve rates and reduces search times by focusing on promising, succinct program fragments across diverse domains.
HySynth encompasses methodologies for program synthesis that address properties extending beyond individual execution traces to relations across multiple executions (hyperproperties). The term references two principal research lines: (1) a context-free surrogate-guided framework for program synthesis leveraging LLMs, and (2) a reactive synthesis architecture for smart contracts guaranteeing both trace and hyperproperty constraints, notably realized via the HyperTSL logic. Both lines employ hybrid approaches unifying symbolic and statistical reasoning to overcome the expressiveness or scalability limitations of purely neural or symbolic methods (Barke et al., 2024, Coenen et al., 2022).
1. Programming-by-Example and the Hybrid Synthesis Problem
In structured prediction and program synthesis, the goal is to synthesize a program in a domain-specific language (DSL) such that, given a finite set of input–output examples , the program satisfies for all examples. The DSL is formally specified by a context-free grammar .
Classical search-based enumerative synthesis explores —the language of all valid programs via the CFG—in increasing cost order. This approach guarantees completeness but suffers exponential blowup as program size grows. In contrast, LLMs can propose plausible programs for unfamiliar DSLs given but rarely generate fully correct code and lack guarantees of consistency with all examples.
HySynth addresses these obstacles by introducing hybrid pipelines:
- In LLM-guided synthesis, LLM-derived samples are used to construct a probabilistic context-free grammar (PCFG) surrogate, which guides symbolic search by re-weighting the search space towards LLM-favored program fragments (Barke et al., 2024).
- In smart contract synthesis, HySynth leverages the logic HyperTSL to formally specify and enforce hyperproperties—requirements across multiple executions—augmenting traditional trace-based synthesis (Coenen et al., 2022).
2. Context-Free LLM Surrogates: Learning and Guided Search
The core idea of LLM-driven HySynth (Barke et al., 2024) is to approximate the conditional distribution using a PCFG surrogate . The workflow proceeds as follows:
- LLM Sampling and Parsing:
- Prompt the LLM (e.g., GPT-4) with the DSL grammar and input–output examples.
- Collect candidate program completions.
- Parse completions under the DSL grammar, retaining grammatically valid .
- PCFG Estimation:
- For each production rule , estimate usage frequency across the valid programs:
where is a Dirichlet prior (typically $1$).
Bottom-up Guided Search:
- Convert the PCFG to a weighted CFG with costs .
- Enumerate programs in increasing cost order, evaluating candidates against all examples until a correct program is found.
- A best-first variant uses A*-style priority based on .
This context-free surrogate is modular—favoring integration with dynamic programming enumerators—and focuses enumeration on operator subspaces most likely to yield correct solutions given the LLM's probabilistic biases.
3. HySynth for Correct-by-Design Smart Contract Synthesis
HySynth for smart contracts (Coenen et al., 2022) is motivated by the necessity of enforcing relational constraints (hyperproperties), such as determinism, symmetry, or information-flow, in security-critical blockchain programs. The methodology introduces HyperTSL, a logic for specifying hyperproperties in infinite-state systems.
- HyperTSL Logic:
- Extends TSL with trace quantifiers (, ).
- Provides temporal operators and cell update predicates over trace variables.
- Hyperproperties are expressed as formulas quantifying over sets of executions.
- Three-Phase HySynth Workflow:
- Preprocessing: Detects and collapses "pseudo-hyperproperties" (formulas that reduce to trace properties) to avoid unnecessary computational overhead.
- TSL Synthesis: Constructs a Mealy machine (winning region) realizing all trace requirements via bounded LTL approximation and BDD-based model checking.
- Refinement/Repair: Enumerates deterministic refinements of the Mealy machine, verifying (through self-composition and HyperLTL checking) which, if any, satisfy the full HyperTSL specification.
The following table summarizes the HySynth smart contract workflow:
| Phase | Input Specification | Main Procedure |
|---|---|---|
| Preprocessing | HyperTSL | Collapse if pseudo-hyperproperty, else retain as HyperTSL |
| TSL Synthesis | Trace property (TSL) | Bounded LTL synthesis to obtain Mealy winning region |
| Refinement | Mealy region + HyperTSL | Enumerate/prune choices, verify against HyperLTL |
4. Empirical Evaluation and Application Domains
HySynth's LLM-PCFG framework (Barke et al., 2024) was evaluated across three representative program synthesis domains:
- Arc Grid Puzzles: 160 tasks from Object-Arc.
- Tensor Manipulation: 69 TFCoder problems.
- String Transformation: 70 SyGuS-String tasks.
Against three baselines—unguided dynamic programming, direct LLM sampling, and off-the-shelf synthesizers—HySynth demonstrated superior performance:
- Overall solve rate: 58% (HySynth) vs. 40% (unguided DP) vs. 2% (LLM alone).
- Median search time reduction by a factor of 3–5, with solutions frequently optimal in size due to PCFG biases toward succinct fragments.
A plausible implication is that PCFG surrogates amplify the practical effectiveness of LLM knowledge by pruning the combinatorial search space early and concentrating computational effort on promising subtrees, especially in DSLs for which LLMs were not explicitly fine-tuned.
5. Technical Limitations
HySynth frameworks exhibit several foundational and practical limitations:
- Surrogate Quality: The approach depends critically on LLMs generating at least partially correct in-DSL samples. For DSLs with highly unfamiliar or ambiguous grammars, unparseable completions may degrade surrogate quality.
- Context-free Expressiveness: CFG and PCFGs cannot enforce non-context-free constraints (e.g., dependent types or global invariants), which could be essential for some DSLs or smart contract protocols.
- Sampling Overhead: Surrogate estimation may require tens or hundreds of LLM invocations, introducing latency and computational cost.
These limitations motivate research into more expressive families of surrogates (e.g., probabilistic context-sensitive grammars) and adaptive LLM querying strategies.
6. Prospective Extensions and Broader Impact
Possible extensions include:
- Adapting the HySynth/PCFG paradigm to structured prediction tasks such as semantic parsing, code repair, or SQL query generation.
- Integrating adaptive sampling, where LLM queries are focused on sub-problems empirically deemed difficult—potentially reducing overhead.
- Employing advanced surrogate models (e.g., probabilistic mildly context-sensitive grammars) to bridge expressiveness gaps between CFGs and richer program logics.
In the formal synthesis of smart contracts, HySynth’s handling of hyperproperties fosters the reliable design of systems with strong multi-execution guarantees—for instance, enforcing deterministic voting behavior or information-flow policies critical for blockchain applications (Coenen et al., 2022).
7. Summary
HySynth represents a family of hybrid program synthesis methods that combine statistical approximations derived from LLM completions and symbolic reasoning. In LLM-guided synthesis, a PCFG surrogate, trained on LLM samples, guides modular search within a context-free space. In correct-by-design contract synthesis, HySynth leverages HyperTSL to guarantee relational properties across multiple executions, providing an end-to-end toolchain for Solidity code generation satisfying both trace and hyperproperty requirements.
Key contributions of HySynth include:
- Efficiently leveraging LLM-derived knowledge for symbolic program search.
- Enabling tractable synthesis of systems subject to complex multi-trace properties.
- Providing robust empirical improvements and serving as a flexible scaffold for future research in hybrid and hyperproperty-guided synthesis (Barke et al., 2024, Coenen et al., 2022).