Efficient Transpilation of OpenQASM 3.0 Dynamic Circuits to CUDA-Q: Performance and Expressiveness Advantages

Published 13 Apr 2026 in quant-ph, cs.ET, and cs.PF | (2604.11599v1)

Abstract: Dynamic quantum circuits with mid-circuit measurement and classical feedforward are essential for near-term algorithms such as error mitigation, adaptive phase estimation, and Variational Quantum Eigensolvers (VQE), yet transpiling these programs across frameworks remains challenging due to inconsistent support for control flow and measurement semantics. We present a transpilation pipeline that converts OpenQASM 3.0 programs with classical control structures (conditionals and bounded loops) into optimized CUDA-Q C++ kernels, leveraging CUDA-Q's native mid-circuit measurement and host-language control flow to translate dynamic patterns without static circuit expansion. Our open-source framework is validated on comprehensive test suites derived from IBM Quantum's classical feedforward guide, including conditional reset, if-else branching, multi-bit predicates, and sequential feedforward, and on VQE-style parameterized circuits with runtime parameter optimization. Experiments show that the resulting CUDA-Q kernels reduce circuit depth by avoiding branch duplication, improve execution efficiency via low-latency classical feedback, and enhance code readability by directly mapping OpenQASM 3.0 control structures to C++ control flow, thereby bridging OpenQASM 3.0's portable circuit specification with CUDA-Q's performance-oriented execution model for NISQ-era applications requiring dynamic circuit capabilities.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper demonstrates a novel transpilation pipeline that directly maps OpenQASM 3.0 control constructs to CUDA-Q, avoiding the overhead of static circuit expansion.
The methodology employs a visitor pattern with a strongly-typed AST, enabling efficient handling of mid-circuit measurements, conditionals, and parameterized workflows.
Benchmark results validate high-fidelity performance in both static and dynamic circuit simulations, supporting NISQ protocols and adaptive quantum algorithms.

Efficient Transpilation of OpenQASM 3.0 Dynamic Circuits to CUDA-Q: Performance and Expressiveness Advantages

Introduction

The paper "Efficient Transpilation of OpenQASM 3.0 Dynamic Circuits to CUDA-Q: Performance and Expressiveness Advantages" (2604.11599) addresses the lack of performant, expressive compiler infrastructure for converting OpenQASM 3.0 programs—including advanced dynamic circuits—into executable kernels on high-performance quantum simulators such as CUDA-Q. The motivation stems from the growing prevalence of dynamic circuits incorporating mid-circuit measurement and classical feedforward, now essential for NISQ-era protocols such as error mitigation, adaptive phase estimation, and VQE. The proposed approach fills a notable gap by providing a modular, Python-based transpilation pipeline fully supporting OpenQASM 3.0’s new control constructs, mapping them cleanly to CUDA-Q’s C++ and Python APIs, and thereby enabling both high-fidelity simulation and cross-ecosystem validation.

Figure 1: High-level ecosystem integrating user-written OpenQASM 3 circuits, the pyqasm frontend, and the CUDA-Q runtime.

Background and Context

Recent quantum software evolution highlights a migration from static, straight-line circuits towards highly dynamic workflows with extensive mid-circuit feedback and both classical and quantum logic. OpenQASM 3.0, as formalized in [cross22], extends upon previous standards by permitting first-class control flow (conditionals, loops), input parameter declarations, and pulse primitives. However, prevailing toolchains—including Qiskit and circuit-centric transpilers—rarely support the full dynamic feature set at the IR or simulator level, especially when the backend requires a statically-typed, high-performance kernel.

CUDA-Q, built on MLIR and its Quake dialect, is designed for optimal simulation (state-vector, tensor, stabilizer) on both CPU and GPU infrastructure. Prior to this work, robust and transparent integration between OpenQASM 3.0 and CUDA-Q was absent, inhibiting use of GPU acceleration for validating dynamic protocols.

Figure 2: Schematic of the Conditional Reset Protocol. A qubit is prepared in $|+\rangle$ using a Hadamard gate, measured mid-circuit, and classically controlled: if the outcome is `1', a Pauli-X gate resets the state to $|0\rangle$ .

Methodology

The transpiler operates through a multi-phase compilation pipeline. Initially, the pyqasm library is used for precise OpenQASM 3.0 parsing, constructing a strongly-typed AST and resolving compile-time constants, runtime variables, and symbolic classical registers during a symbol table pass. The core logic is structured as a visitor pattern, with a CUDAQVisitor class that traverses the AST and emits CUDA-Q (Python or C++) kernel code.

Crucially, quantum gates, modifiers, and control constructs are systematically translated. OpenQASM gate modifiers such as ctrl and inv are rewritten as calls to CUDA-Q’s functional interface, enabling dynamic construction of controlled or adjoint operations during runtime without static circuit expansion. If-statements in the AST are mapped to CUDA-Q’s kernel.c_if functionality, with classical conditions directly controlling quantum subroutines, dispensing with the traditional approach of branch duplication and thus reducing circuit depth. Input declarations yield kernel parameters, supporting parameter-efficient VQE workflows without repeated recompilation.

The transpiler further enables backend and device-agnostic execution: kernels are emitted for either CPU-based (qpp-cpu) or GPU-accelerated (nvidia/cuQuantum) backends, with optional support for Qiskit as a secondary verification target.

Validation and Numerical Results

Three benchmarking suites were designed: static circuit correctness (Clifford, QFT, Bernstein-Vazirani), dynamic control (conditional reset, teleportation), and hybrid parameterized algorithms (VQE HEA ansatz). For static circuits, bitwise fidelity between CUDA-Q and Qiskit simulations was exact, validating that complex gate modifiers and nested control blocks are preserved through compilation.

Dynamic control was assessed using the conditional reset protocol (see Figure 2): a single-qubit is initialized, measured, and conditionally reset to $|0\rangle$ . Over $10^4$ shots, the target state was measured with probability exceeding $0.999$, matching theoretical expectation and demonstrating correct translation and execution of mid-circuit feedback. Quantum teleportation was similarly validated.

For hybrid VQE-style compilation, input variables in OpenQASM were mapped to CUDA-Q kernel arguments. This enabled compile-once, run-many execution; parameter updates did not trigger recompilation overhead, and energy expectation values for Hamiltonians such as $H = Z_0 Z_1$ were found to be correct to within simulation tolerance for all 50 parameter sweeps.

Software Architecture and Practical Implications

The transpiler is packaged as a modular Python library with a Dockerized reference environment, enabling high-performance quantum simulation workflows independent of specialized GPU hardware. This accessibility is key for NISQ-era algorithmic research and prototyping. The framework supports end-to-end OpenQASM 3.0 workflows, from parsing through MLIR lowering to backend execution, and exposes extensibility for future integration with additional simulators (e.g., Stim for stabilizer sampling).

Strong claims include:

Direct mapping of OpenQASM 3.0 control flow to backend API, completely avoiding static circuit expansion and its associated exponential depth overhead.
Complete and deterministic validation of dynamic, mid-circuit feedforward circuits.
No JIT recompilation penalty in parameterized kernel evaluation, optimizing VQE and QML algorithm deployment.
Enabling of realistic error-correction protocol simulation due to support for true dynamic feedback.

The practical impact is transparent migration from OpenQASM 3.0 IR to high-performance CUDA-Q simulation, including for circuits with complex classical/quantum interaction.

Future Directions

Planned enhancements include support for pulse-level (timing and analog) instructions as OpenQASM 3.0 matures, mapping these to CUDA-Q’s analog kernel interfaces. Another direction is adopting QIR as a primary IR emission target, facilitating broader backend support (beyond NVIDIA hardware), and integrating with density-matrix and tensor backends for robust noise and error analysis. This opens the avenue for in-depth simulation of error-corrected codes beyond idealized statevector frameworks.

Conclusion

This work establishes a comprehensive transpilation pipeline bridging OpenQASM 3.0 and CUDA-Q, thereby equipping researchers with expressive, high-performance tooling for simulating dynamic quantum circuits and hybrid algorithms. The framework’s rigorous AST-to-backend mapping enables precise validation of classical feedforward, parametric runs, and general control flow, supporting near-term algorithm development and pre-hardware validation of error correction. Extensions to pulse-level and noisy simulation protocols are a natural progression, with broader theoretical significance for cross-ecosystem quantum software standardization.

Markdown Report Issue