HYPERHEURIST: A Simulated Annealing-Based Control Framework for LLM-Driven Code Generation in Optimized Hardware Design

Published 17 Apr 2026 in cs.AR and cs.AI | (2604.15642v1)

Abstract: LLMs have shown promising progress for generating Register Transfer Level (RTL) hardware designs, largely because they can rapidly propose alternative architectural realizations. However, single-shot LLM generation struggles to consistently produce designs that are both functionally correct and power-efficient. This paper proposes HYPERHEURIST, a simulated annealing-based control framework that treats LLM-generated RTL as intermediate candidates rather than final designs. The suggested system not only focuses on functionality correctness but also on Power-Performance-Area (PPA) optimization. In the first phase, RTL candidates are filtered through compilation, structural checks, and simulation to identify functionally valid designs. PPA optimization is restricted to RTL designs that have already passed compilation and simulation. Evaluated across eight RTL benchmarks, this staged approach yields more stable and repeatable optimization behavior than single-pass LLM-generated RTL.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a two-phase framework that first ensures functional RTL correctness and then optimizes power, performance, and area (PPA) using simulated annealing.
It leverages multiple LLM pipelines to generate, mutate, and critique RTL designs, ensuring only valid candidates progress to synthesis.
Experimental results show up to 70% relative improvement in structural correctness and significant PPA gains over baseline LLM generation.

HYPERHEURIST: Simulated Annealing–Controlled LLM-Driven RTL Optimization

Introduction

The paper introduces HYPERHEURIST, a simulated annealing (SA)-based control framework for integrating LLMs into the process of Register-Transfer Level (RTL) code generation for digital hardware. Unlike single-shot LLM generation, which is limited by its inability to consistently deliver functionally correct and PPA-efficient designs, HYPERHEURIST explicitly decouples correctness discovery from power-performance-area (PPA) optimization. The framework treats LLM-generated RTL artifacts as mutable intermediates, subjecting them to systematic evaluation, mutation, and refinement under a closed-loop optimization protocol.

LLMs have recently been adopted for design space exploration in high-level synthesis, RTL rewriting, and even physical design. Prior frameworks such as HLS-BO-LLM and iDSE employ LLMs for pragma optimization and guided exploration, while methods such as ASPEN and BUFFALO extend LLMs for formal RTL rewriting and buffer tree construction, respectively. However, these methods suffer from several limitations:

Most are confined to a single stage of the design flow and lack generalization.
Coupling functional correctness and PPA optimization within a single search process leads to inefficient exploration, especially when unfiltered, faulty candidates pollute the search space.
Imprecise feedback and weak control logic result in oscillatory or unreproducible QoR improvements.

HYPERHEURIST addresses these by explicitly separating functional correctness discovery and PPA refinement into distinct, coordinated annealing-guided phases while leveraging tool-in-the-loop EDA feedback.

Framework Methodology

Architecture Overview

HYPERHEURIST operates in two sequential phases:

Phase I: Correctness-Driven Search The system explores the design space to discover RTL implementations that are functionally correct and structurally consistent. Four distinct LLM pipelines are deployed:
- Generator: Produces canonical RTL instantiations.
- Aggressive Mutator: Pushes the architectural landscape, introducing non-local changes.
- Conservative Mutator: Applies minimal, targeted repairs.
- Critique: Diagnoses design flaws and proposes corrections via JSON-structured hypotheses.

Candidates are filtered through VCS-based compilation, simulation, and structural checks. Only designs passing all correctness gates are considered for subsequent PPA-driven transformation.

Phase II: PPA-Constrained Optimization Here, the search is confined to functionally validated candidates. Structural mutations are biased toward synthesis and timing-friendly modifications. Each candidate’s area, power, and timing metrics are extracted using Synopsys Design Compiler; a parameterized composite objective $\mathcal{J} = \alpha \widehat{A} + \beta \widehat{P} + \gamma \widehat{T}$ guides simulated annealing acceptance with correctness as a hard constraint.

Unified Simulated Annealing Control

Both phases employ a uniform SA routine, differing only in their objective functions—correctness score in Phase I and normalized PPA composite in Phase II. The SA controller balances exploration and exploitation across both correctness and PPA landscapes, drawing on tool feedback (compilation logs, simulation, synthesis reports) to steer LLM mutation and candidate acceptance.

Experimental Evaluation

Benchmarks and Prompts

HYPERHEURIST is evaluated on eight canonical RTL problems from the RTLLLM benchmark suite. Each pipeline is grounded in a rigid prompt contract—mandating strict interface and synthesizability—while pipeline-specific prompts induce distinct behaviors (safe repair, aggressive exploration, or expert critique).

Correctness and PPA Metrics

For each design, correctness is tracked along syntactic, structural, and logical axes. Only designs passing all VCS validation stages are synthesized for PPA metric extraction. Comparative analysis is performed against single-shot and baseline LLM-RTL generation (GPT-4.0, GPT-3.5 w/ self-planning).

Results

Correctness Convergence

HYPERHEURIST delivers consistent gains in structural and logical correctness. On several benchmarks (e.g., johnson_counter, mux2_sync), absolute structural correctness improved by up to 35%, with up to 70% relative gain over baseline LLM generation. Failure cases (e.g., traffic_light, parallel2serial) are attributed to deficiencies in tool feedback signal and the challenge of capturing complex temporal FSM semantics through LLMs—highlighting feedback granularity as a limiting factor for currently addressable control-intensive modules.

PPA Improvement

PPA-optimized candidates show substantial improvements over both GPT-4 and GPT-3.5 baselines. For example, johnson_counter yields an implementation with area $59.9~\mu m^2$ , power $80.9~\mu W$ , and positive timing slack, compared to negative slack and higher area/power from baseline LLM outputs. All PPA evaluations are performed under a fixed process technology and toolchain, ensuring isolation of heuristic and architectural variables.

Analysis of Failure Modes

When correctness cannot be established in Phase I, no candidate proceeds to PPA synthesis. This gating prevents misleading optimization and ensures that PPA comparisons are not contaminated by functionally invalid designs.

Theoretical and Practical Implications

The phase-decoupled structure of HYPERHEURIST provides a model for scalable LLM-driven hardware design that is robust to prompt instability and model idiosyncrasies. By enforcing correctness gating, the system avoids regressions prevalent in single-shot LLM application. Tool-driven annealing enables both broad search and fine optimization, integrating well with standard EDA flows. Practically, HYPERHEURIST's architecture is framework-agnostic and amenable to further extension, providing a foundation for unified, closed-loop EDA workflows integrating LLMs as adaptive search orchestrators rather than brittle code generators.

Future Research Directions

Open avenues include extending feedback mechanisms to richer EDA stages (placement, routing), improving diagnostic signal integration for handling deep control and temporal logic, and scaling to complex pipelined or multi-module architectures. Incorporating gradient-based learning or cross-problem transfer is also a promising direction for further enhancing the efficiency and generality of LLM-EDA integration.

Conclusion

HYPERHEURIST demonstrates a robust and reproducible strategy for LLM–EDA integration in RTL generation and optimization. By decoupling correctness and PPA objectives under SA-based control, it achieves significant and stable correctness and PPA improvement across a spectrum of RTL design problems. The framework’s architecture improves traceability, maintains compatibility with standard EDA flows, and paves the way for further advances in LLM-driven circuit design and design automation.

Markdown Report Issue