Core Refined Understanding eXpression (CRUX)
- CRUX is a grammar-based intermediate representation defined by three key components: Module Interface, Core Functions, and Key Considerations.
- It employs a two-stage learning process combining supervised and reinforcement methods to optimize Verilog generation accuracy and reduce semantic drift.
- CRUX’s modular design enables its use as a plug-in artifact for various models, improving specification-to-RTL tasks across hardware synthesis benchmarks.
Core Refined Understanding eXpression (CRUX) is a grammar-based, structured intermediate representation designed to bridge the semantic gap between open-ended natural language hardware specifications and domain-specific Verilog implementations. Developed as part of the QiMeng-CRUX framework, CRUX organizes essential design intent and constraints to enable precise, robust, and transferable code generation for hardware description languages, particularly Verilog (Huang et al., 25 Nov 2025).
1. Formal Structure and Components
CRUX is formally defined as a three-component template:
- (Module Interface): Specifies all module ports and parameters.
- (Core Functions): States the essential state-transition and data-flow logic central to the hardware’s function.
- (Key Considerations): Enumerates critical, often subtle constraints, such as reset policies and timing assumptions.
A CRUX instance for a natural language prompt factorizes into three contiguous spans:
Each span , , is restricted to a context-free grammar. Tokens are drawn from a compact intermediate vocabulary tailored for expressiveness and slot-filling semantics.
2. Syntax and Semantic Specification
CRUX expressions must conform to a concise BNF/EBNF, which ensures unambiguous, minimal, and fully structured encoding of hardware semantics. The top-level template is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
<CRUX> ::= <ModuleInterface> <CoreFunctions> <KeyConsiderations>
<ModuleInterface> ::= "module" <ModuleName> "(" <PortList> ");"
<PortList> ::= <PortDecl> { "," <PortDecl> }
<PortDecl> ::= <Direction> <SignalName> [ "[" <Width> "]" ]
<Direction> ::= "input" | "output" | "inout"
<CoreFunctions> ::= "Core Functions:" <BehaviorList>
<BehaviorList> ::= <Behavior> { ";" <Behavior> }
<Behavior> ::= <StateClause> | <DataflowClause>
<StateClause> ::= "From" <StateName> ":" <ConditionActionList>
<ConditionActionList> ::= <ConditionAction> { "," <ConditionAction> }
<ConditionAction> ::= "On input" <Signal> "=" <Value> "," "transition to" <StateName>
<DataflowClause> ::= "Compute" <Expr> "given" <Inputs>
<KeyConsiderations> ::= "Key Considerations:" <ConstraintList>
<ConstraintList> ::= <Constraint> { ";" <Constraint> } |
3. Joint Supervised and Reinforcement Learning Framework
CRUX underlies a two-stage learning process:
Stage I: Joint Expression Modeling (JEM)
- Uses a dataset consisting of natural language prompts , CRUX expressions , and reference Verilog .
- Minimizes the supervised cross-entropy loss:
Stage II: Dual-Space Optimization (DSO) via CRUX-Enhanced GRPO
- Further optimizes via a policy-gradient method that rewards (a) functional correctness of generated Verilog, (b) usefulness of the generated CRUX.
- Code reward:
- CRUX reward:
- Total reward: , typically with , .
- Optimization employs Group Relative Policy Optimization (GRPO) without KL-penalty, which increases output diversity.
4. Inference and Decoding Procedure
At inference time, a two-stage decode is performed:
- CRUX Decoding:
- Verilog Decoding:
In practice, either beam search or sampling candidates per stage are employed for computing pass@k metrics. The structured and concise CRUX format produces a highly concentrated distribution for Verilog generation, reducing semantic drift and increasing correctness.
5. Benchmark Performance and Empirical Results
QiMeng-CRUX-V was evaluated on VerilogEval-V1, VerilogEval-V2 (both CC and SR tracks), RTLLM-V1, and RTLLM-V2 benchmarks, primarily using the pass@1 metric. Key results include:
| Model (Setting) | VerilogEval-V2 (SR) | RTLLM-V2 |
|---|---|---|
| Baseline (OriGen-7B) | 49.3% | — |
| +RealSpec only (robustness) | 53.2% | — |
| +CRUX (structure) | 59.6% | — |
| QiMeng-CRUX-V-SFT | 59.6% | — |
| QiMeng-CRUX-V-Final (full DSO) | 64.7% (+15.4%) | 63.8% (+12.9%) |
| Qwen2.5-Coder | — | 50.9% |
Ablations indicate that RealSpec and the structured CRUX representation provide orthogonal improvements (+8.4% and +6.4%, respectively) on VerilogEval-V2 SR (Spec-to-RTL) tasks. Stage II optimizations with CRUX-reward benefit both functionality and informativeness, increasing SR from 59.6% (no CRUX-reward) to 64.7% (with CRUX-reward).
CRUX also demonstrates notable transferability: appending the learned CRUX to arbitrary code models, such as Qwen2.5-Coder, boosts pass@1 for SR from 22.0% (“Only CRUX”) to 35.9%, and further to 37.8% when augmented with the original description (“Des+CRUX”). This suggests CRUX acts as a model-agnostic semantic scaffold.
6. Significance and Reusability
CRUX offers a principled, grammar-based, three-slot scaffolding that captures user intent, enables joint modeling of intermediate intent and code, and supports RL-based sharpening of code-generation distributions. Its main empirical effect is dramatic reduction of semantic drift and increased success rate on specification-to-RTL tasks. The modular structure of CRUX allows its use as a plug-in artifact for other LLMs without retraining, providing broad architectural transferability and immediate improvements in code generation quality. The approach achieves state-of-the-art results across all major Verilog code-generation benchmarks and sets a new technical standard for prompt engineering and intermediate representation in hardware synthesis via LLMs (Huang et al., 25 Nov 2025).