High-Level Synthesis (HLS) Code

Updated 5 December 2025

High-Level Synthesis (HLS) code is high-level language code annotated to map algorithms into hardware with static bounds, fixed arrays, and explicit parallelism.
It employs pragmas like PIPELINE and UNROLL to optimize loops, control resource utilization, and ensure synthesizability onto FPGAs or ASICs.
LLM-driven frameworks automate HLS code generation through prompt engineering, retrieval-augmented generation, and iterative feedback for improved performance.

High-Level Synthesis (HLS) code refers to source code written in high-level languages (typically C, C++, or OpenCL) that is specifically structured and annotated for automated translation into hardware descriptions (RTL, e.g. Verilog or VHDL), enabling synthesis onto FPGAs or ASICs. HLS code embodies hardware-specific constraints, exposes parallelism, and leverages optimization directives (pragmas) to communicate architectural intent to synthesis tools. Recent advances demonstrate that the generation, transformation, optimization, and evaluation of HLS code are increasingly mediated by LLMs and automated frameworks, resulting in significant gains in productivity, correctness, and performance (Gai et al., 19 Feb 2025).

1. Principles and Structure of HLS Code

HLS code is constructed to map algorithmic functionality in C/C++ to hardware. The essential characteristics distinguishing HLS code from general-purpose software are:

Static bounds and fixed-size arrays: All loops must have compile-time constant bounds, and dynamic memory (e.g., malloc/free) is replaced by static arrays sized to fit the hardware (Zou et al., 6 Jul 2025, Collini et al., 2024).
Explicit parallelism and pipelining: Pragmas such as #pragma HLS PIPELINE II=1 and #pragma HLS UNROLL factor=k instruct the synthesis tool to generate pipelines with initiation intervals of 1 and replicate compute elements for parallel processing.
Data and interface annotations: Bit-widths are tightly constrained using fixed-point types (e.g., ap_fixed, ac_int), and hardware ports are specified using pragmas like #pragma HLS INTERFACE to denote memory-mapped or streaming interfaces.
Hardware-friendly control flow: Recursion is prohibited; functions are rewritten into iterative loops with explicit stacks and bounded resources (Collini et al., 2024).

Typical HLS code organization comprises a top-level function annotated for hardware interfaces, computational kernels (e.g., matrix multiply, convolution), and supporting modules (buffers, state machines). Example:

void matmul(int A[N] [N], int B[N] [N], int C[N] [N]) {
  #pragma HLS array_partition variable=A complete dim=2
  #pragma HLS array_partition variable=B complete dim=1
  for (int i = 0; i < N; i++) {
    #pragma HLS pipeline II=1
    for (int j = 0; j < N; j++) {
      int sum = 0;
      for (int k = 0; k < N; k++) {
        sum += A[i] [k] * B[k] [j];
      }
      C[i] [j] = sum;
    }
  }
}

Such code can be automatically generated or optimized by LLM-driven frameworks (Gai et al., 19 Feb 2025).

2. Code Transformations, Pragmas, and Optimization

HLS code achieves performance and resource goals through specific transformations:

Loop tiling and restructuring: Strip-mining loops for on-chip buffer reuse and memory bandwidth reduction (Pouget et al., 2024).
Pragma insertion: Pipeline and unroll pragmas, array partitioning, streaming directives, and resource mappings (see Table below, summarized from (Zou et al., 6 Jul 2025)).

Pragma	Effect
#pragma HLS PIPELINE II=<ii>	Inserts pipeline registers; sets II
#pragma HLS UNROLL factor=<f>	Unrolls loop by factor f
#pragma HLS ARRAY_PARTITION variable=arr	Partitions array for parallel access
#pragma HLS DATAFLOW	Enables concurrent execution of tasks
#pragma HLS RESOURCE variable=buf core=RAM_2P	Maps array to RAM primitives (BRAM/URAM)
#pragma HLS DEPENDENCE variable=arr inter false	Relaxes cross-iteration dependencies

Correct insertion and tuning of pragmas are critical; automated frameworks increasingly employ feedback loops, code analysis, and design space exploration engines to select best parameter values (Pouget et al., 2024, Xiong et al., 2024).

3. Automated HLS Code Generation and LLMs

Emerging research demonstrates that LLMs can generate, refactor, and optimize HLS code with high accuracy, leveraging both prompt engineering and fine-tuned models:

Prompt engineering: Chain-of-thought templates guide the LLM to reason about hardware limits, outline loop/pipeline strategies, select data types/interfaces, and insert code with pragmas (e.g., "Step 1: reason about BRAM; Step 2: loop strategy; ...") (Gai et al., 19 Feb 2025).
Retrieval-augmented generation (RAG): Top-k most relevant expert code snippets are retrieved from a domain KB, included in the prompt to enhance the LLM's output (Xu et al., 2024).
Feedback loops: Syntax errors and functional mismatches (from GCC/HLS compilation and testbench runs) are returned to the LLM for iterative repair (Xu et al., 2024, Gai et al., 19 Feb 2025).
Fine-tuning: Domain-specific datasets (e.g., SAGE-HLS, HLStrans) enable fine-tuned LLMs to outperform general-purpose models, with up to ≈100% synthesizability and 75% functional correctness (Zou et al., 6 Jul 2025, Khan et al., 5 Aug 2025).

LLM-driven frameworks (e.g., HLSPilot, TimelyHLS, RALAD) combine retrieval, in-context learning, and tool feedback, automating the generation and optimization loop (Xiong et al., 2024, Mashnoor et al., 23 Jul 2025).

4. Evaluation and Benchmarking Methodologies

HLS code correctness and performance are assessed with specialized benchmarks and metrics:

Benchmarks: Datasets like HLStrans (137 base programs, 23k+ variants), HLS-Eval (94 diverse kernels), and SAGE-HLS (16.7k ported Verilog modules) enable systematic evaluation of LLM outputs (Zou et al., 6 Jul 2025, Abi-Karam et al., 16 Apr 2025, Khan et al., 5 Aug 2025).
Metrics:
- Synthesizability: Fraction of candidates successfully compiled to RTL ( $\mathrm{Synth}@k$ ).
- Functional correctness: Fraction passing simulation/testbench ( $\mathrm{Pass}@k$ ).
- Performance: Initiation interval (II), latency, resource utilization (LUTs/DSPs/BRAM).
- pass@k: Probability at least one out of k generated variants passes all tests, defined as $\mathrm{pass}@k = 1 - \frac{\binom{n-r}{k}{\binom{n}{k}$ where n = # samples, r = # correct (Gai et al., 19 Feb 2025, Abi-Karam et al., 16 Apr 2025).
- BLEU and ROUGE scores occasionally used for text/code similarity.
Experimental results: With CoT and feedback, fine-tuned LLMs exceed 97% syntax pass@3 and ≈70% functional pass@3, with significant runtime and token efficiency gains over HDL-based flows (Gai et al., 19 Feb 2025, Khan et al., 5 Aug 2025).

5. Case Studies and Library Ecosystems

Several frameworks and libraries have generalized HLS code patterns and established best practices:

hls4ml: Translates neural networks from ML frameworks into HLS C++, supports Xilinx, Intel, and Catapult HLS backends. Parametric templates control layer parallelization (reuse factor, unroll), fixed-point quantization, and dataflow scheduling. Empirical reductions of latency, area, and resource consumption have been documented (Schulte et al., 1 Dec 2025, Curzel et al., 2021).
hlslib: Provides header-only C++ abstractions for FIFOs, reduction trees, shift registers, data packs, and stream interfaces; enables modular, portable HLS code and decouples application logic from vendor toolchains (Licht et al., 2019).
AnyHLS: Uses higher-order abstractions and partial evaluation to emit vendor-agnostic HLS code, shifting pragma management to backend code emitters and supporting image processing pipelines with competitive throughput and area (Özkan et al., 2020).

Best practices include modularizing hardware tasks, parameterizing loop bounds and data widths, separating algorithm from schedule, and tightly integrating tool feedback into the design loop (Licht et al., 2019, Curzel et al., 2021).

6. Impact, Limitations, and Future Directions

Automated HLS code generation and optimization, particularly through LLM-driven frameworks, has reduced manual tuning, democratized hardware design, and delivered high quality-of-results (QoR) on a broad spectrum of applications. Key advances:

Data efficiency and productivity: HLS code—more closely aligned with C/C++—enables pre-trained LLMs to generalize from smaller datasets (42k lines vs. millions for Verilog), reducing cost and energy per inference (Gai et al., 19 Feb 2025).
Token and runtime savings: HLS is 3–4× shorter in tokens than equivalent HDL, making LLM flows less computationally expensive (Gai et al., 19 Feb 2025).
Design space exploration: DSE engines and NLP solvers have significantly increased throughput and speedup over rule-based and ML-guided optimization (e.g., up to 13× speedup in jacobi-2d, 403× in 2mm) (Pouget et al., 2024).
Remaining challenges:
- Power/performance/area (PPA) integration into closed loops (Gai et al., 19 Feb 2025, Xu et al., 2024).
- Support for control-rich, hierarchical, or irregular codes (dynamic memory, recursion, pointer aliasing) (Collini et al., 2024).
- Verification and robustness of LLM outputs, especially for functional correctness at high difficulty tiers (Khan et al., 5 Aug 2025).
- Scaling closed-loop synthesis with finer-grained tool feedback, extending to next-generation reasoning models.

Current forward directions include:

Expanding datasets for control-rich and user-defined kernels.
Retrieval- and testbench-augmented inference at scale.
Integration with higher-level DSLs and end-to-end synth flows accessible to software engineers, not only hardware experts (Matai et al., 2014).
Comparative studies across diverse FPGA/ASIC platforms and vendor toolchains (Schulte et al., 1 Dec 2025, Mashnoor et al., 23 Jul 2025).

In summary, HLS code serves as the interface between high-level algorithm description and hardware implementation, with its optimization and synthesis now increasingly mediated by automated, LLM-driven technologies. This trajectory is enabling rapid, reliable, and scalable hardware design from general C/C++–style sources, transforming both the field and its practical accessibility (Gai et al., 19 Feb 2025).

Markdown Upgrade to Chat

References (16)

Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis (2025)

HLStrans: Dataset for LLM-Driven C-to-HLS Hardware Code Synthesis (2025)

C2HLSC: Can LLMs Bridge the Software-to-Hardware Design Gap? (2024)

C2HLSC: Leveraging Large Language Models to Bridge the Software-to-Hardware Design Gap (2024)

A Unified Framework for Automated Code Transformation and Pragma Insertion (2024)

HLSPilot: LLM-based High-Level Synthesis (2024)

Optimizing High-Level Synthesis Designs with Retrieval-Augmented Large Language Models (2024)

Automated C/C++ Program Repair for High-Level Synthesis via Large Language Models (2024)

SAGE-HLS: Syntax-Aware AST-Guided LLM for High-Level Synthesis Code Generation (2025)

10.

TimelyHLS: LLM-Based Timing-Aware and Architecture-Specific FPGA HLS Optimization (2025)

11.

HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks (2025)

12.

hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware (2025)

13.

De-specializing an HLS library for Deep Neural Networks: improvements upon hls4ml (2021)

14.

hlslib: Software Engineering for Hardware Design (2019)

15.

AnyHLS: High-Level Synthesis with Partial Evaluation (2020)

16.

Enabling FPGAs for the Masses (2014)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to High-Level Synthesis (HLS) Code.