Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Semantic Quantum Circuit Cache for Scalable and Distributed Quantum-Classical Workflows

Published 29 Apr 2026 in cs.DC and quant-ph | (2604.26788v1)

Abstract: Hybrid quantum--classical workflows often execute large ensembles of circuits that differ syntactically but implement identical operations, leading to substantial redundant computation. To address this, we introduce the Quantum Circuit Cache, a content-addressable system that detects semantic equivalence and reuses previously computed results across executions, backends, and workflow stages. Our approach combines ZX-calculus reduction with isomorphism-invariant Weisfeiler--Leman graph hashing to generate deterministic circuit identifiers, enabling constant-time lookup in distributed caches supporting both lightweight LMDB and scalable Redis deployments. The system integrates transparently into hybrid HPC workflows and remains backend-agnostic across CPU, GPU, and QPU environments. We evaluate the system on MareNostrum 5 with two representative workloads: distributed wire cutting and Differential Evolution-based QAOA optimization. For wire cutting, caching eliminates up to 91.98% of redundant subcircuit simulations, yielding speedups up to 7.0 times on a single node and maintaining advantages at scale, with Redis-based caching achieving up to 1.6 times speedups under high parallelism. Validation on a 35-qubit superconducting QPU confirms these benefits, achieving an 11.2 times speedup on real hardware. In distributed QAOA optimization, equivalence-aware caching avoids up to 27.6% of circuit evaluations and consistently reduces execution cost without altering the optimization algorithm. In both cases, reuse grows with concurrency and circuit structure, highlighting redundancy as a major systems bottleneck and demonstrating the effectiveness of our Quantum Circuit Cache.

Summary

  • The paper presents a semantic quantum circuit cache that reduces redundant quantum circuit evaluations by detecting semantically equivalent circuits using ZX-calculus and WL graph hashing.
  • The method scales effectively in distributed setups, achieving up to 11.2× speedup on real hardware and reducing up to 91.98% of redundant subcircuit executions during wire cutting.
  • It integrates seamlessly with hybrid quantum-classical workflows, ensuring resource efficiency and portability through backend-agnostic implementations with LMDB and Redis.

Semantic Quantum Circuit Caching for Scalable Hybrid Quantum-Classical Workflows

Introduction and Motivation

The exponential growth of hybrid quantum-classical workflows for near-term quantum computing exposes substantial inefficiencies from redundant quantum circuit execution. This redundancy arises when workflows generate large ensembles of circuits that, though they differ at the syntactic level due to gate reordering, compilation artifacts, or parameter choices, are semantically equivalent and implement the same quantum operation. Traditional quantum software stacks, which define circuit identity based on syntactic representations (e.g., QASM strings or gate lists), are not equipped to recognize and eliminate such inefficiency.

"A Semantic Quantum Circuit Cache for Scalable and Distributed Quantum-Classical Workflows" (2604.26788) addresses this bottleneck by introducing the Quantum Circuit Cache, a system primitive that detects semantic circuit equivalence and enables persistent, backend-agnostic reuse of quantum computations in distributed classical-quantum workflows. The cache transparently identifies semantically equivalent circuits across randomizations, parameter sweeps, compiler passes, and workflow stages, serving as a content-addressable system designed to amortize both classical and quantum resource consumption.

Methods: Semantic Hashing via ZX-calculus and Weisfeiler–Leman Graph

Key to the Quantum Circuit Cache is the use of ZX-calculus for semantics-preserving reduction, followed by isomorphism-invariant Weisfeiler–Leman (WL) graph hashing to produce deterministic, backend-independent identifiers for circuits. Incoming circuits are first transformed into ZX-calculus graphs and reduced via a robust, deterministic sequence of rewrite rules that normalize superficial structural differences while preserving quantum semantics. Figure 1

Figure 1: End-to-end workflow of the Quantum Circuit Cache, from ZX-calculus graph reduction to Weisfeiler--Leman hash key generation and distributed cache lookup/execution.

The reduced ZX-calculus graphs are serialized in a canonical form and hashed using WL refinement, which aggregates vertex attributes and topology to construct a concise, stable fingerprint invariant to gate ordering, local circuit rewrites, and most compiler artifacts. This pipeline does not guarantee complete equivalence detection due to the non-uniqueness of ZX normal forms, but in practical workloads (especially Clifford+T-dominated, variational ansätze, and subcircuits from wire cutting), convergence is reliable and effective. The resulting hash serves as a cache key for storing or retrieving simulation results, measurement statistics, or QPU execution metadata.

The cache utilizes two backends: a local, memory-mapped disk solution (LMDB) suited for modest parallelism and lightweight deployments, and a distributed Redis cluster for high-parallelism HPC environments. Both support constant-time lookup and can be ported between deployments via a universal LMDB dump/restore mechanism.

Evaluation: Distributed Wire Cutting and QAOA Workloads

Wire Cutting

Wire cutting decomposes large quantum circuits into many subcircuits by inserting “cuts” and expanding the resulting operator basis. For cc cuts, this produces 8c8^c (or 2×8c2 \times 8^c) subcircuits, most of which are structurally redundant. Evaluation on 48-qubit Hardware-Efficient Ansatz (HEA) and random circuits with four wire cuts demonstrates that the cache can eliminate up to 91.98% of redundant execution—equivalent to avoiding 7,544 quantum subcircuit simulations out of 8,192 total. Figure 2

Figure 2: Total execution time for HEA circuits with four wire cuts and varying compute nodes; distributed cache reduces wall time substantially, with Redis backend showing better scaling at high parallelism.

Both LMDB and Redis backends significantly reduce runtime across node count; Redis, with concurrent writes and internal sharding, outperforms LMDB under high concurrency, achieving up to 7.0×7.0\times speedup on one node and at least 1.6×1.6\times speedup at 64 nodes. Notably, similar speedup is observed in random circuits—demonstrating that strong regularity is not a prerequisite for high-effectiveness circuit caching. Figure 3

Figure 3

Figure 3: Cache behavior for HEA circuits using an LMDB backend, showing the dominance of cache hits but increased extra simulations under higher parallelism due to the single-writer constraint.

Cache hit behavior and extra simulation due to concurrent writes are further dissected: Redis supports multiple writers, stabilizing the system at large scale, whereas LMDB’s single-writer model can cause limited redundant computation under heavy load.

Direct validation on a 35-qubit superconducting quantum processor (MareNostrum Ona) confirms the systems-level benefit, as semantic caching yields an 11.2×11.2\times speedup for HEA circuits with four wire cuts—reducing physical QPU time from an estimated 20.5 hours to only 1.83 hours by avoiding duplicate subcircuit execution.

QAOA with Differential Evolution

Variational algorithms such as QAOA, often evaluated over dense parameter grids with evolutionary or gradient-free optimizers, feature significant redundancy in the underlying circuit structure after parameter discretization and ZX-calculus reduction. The cache was evaluated on Max-Cut QAOA for 24-vertex random graphs, using Differential Evolution (DE) optimizers across three levels of parameter discretization and three circuit depths. Figure 4

Figure 4: Cumulative cache hits versus DE optimizer iteration for QAOA at different depths and discretizations, demonstrating increasing reuse as optimization proceeds.

Cache reuse is particularly prominent for coarser and medium discretizations, avoiding up to 27.6% of circuit evaluations in p=2p=2 QAOA with medium discretization—consistently lowering execution cost without affecting optimization convergence or final solution quality. Even for finer discretizations and deeper circuits, thousands of executions are bypassed. This effect is robust across random seeds and workloads, validating that much of the computational effort in variational quantum algorithms is redundant under semantics-aware analysis. Figure 5

Figure 5: Convergence of best Max-Cut energy for all configurations, indicating that cache-enabled equivalence detection does not adversely impact optimizer behavior but reduces redundant computation.

Figure 6

Figure 6: Cache hit percentage for p=3p=3 as a function of optimizer iterations, confirming that coarser discretizations achieve consistently higher rates of cache reuse.

Cache effectiveness scales with population size in DE optimizers, as illustrated by the increase in avoided simulations with larger populations, indicating the method's suitability for future HPC-scale quantum-classical optimization. Figure 7

Figure 7: Total avoided circuit simulations as a function of population size, showing the scaling benefit in parallel/hybrid optimizer settings.

Implementation Considerations

Overhead introduced by semantics-based hashing and reduction is minimal relative to quantum circuit execution time. Cache lookup, ZX-calculus reduction, and WL hashing collectively take approximately 0.13 seconds per cache miss, while a typical 28-qubit simulation with Qiskit Aer requires >35>35 seconds. Consequently, even moderate cache hit rates substantially amortize the extra pipeline computation.

For backend storage, LMDB is memory efficient (on the order of hundreds of bytes per full statevector), while Redis incurs higher per-entry overhead due to serialization and distributed infrastructure, but provides the necessary scaling for very high concurrency.

Implications and Future Directions

Semantic quantum circuit caching represents a systems-level advance in hybrid quantum computing, abstracting equivalence detection from a verification tool to a persistent, distributed service primitive. Its introduction establishes a foundation for future quantum workflow orchestration, enabling backends to treat quantum circuits as reusable computational artifacts much as classical HPC systems exploit memoization and checkpointing.

Practical implications include:

  • Substantial speedup of current hybrid workloads, especially for wire cutting, error mitigation, quantum chemistry VQEs, benchmarking, and distributed parameter sweeps.
  • Efficient utilization of scarce QPU resources: redundant evaluation on hardware is dramatically reduced, directly lowering operational costs and queue time.
  • Scalable deployment on existing HPC infrastructure: the system is backend-agnostic and suitable for both shared-memory and distributed-memory quantum-classical pipelines.
  • Transparent integration: quantum algorithms and classical optimizers require no modification; cache mediation is handled at the systems layer.
  • Portability across deployments: the LMDB snapshotting mechanism ensures caches can be archived, shared, and restored.

Several avenues for further development and research are highlighted:

  • Improved normal forms for ZX-calculus: increased equivalence detection completeness would expand hit rates, especially for deep, entangled circuits.
  • Support for dynamic circuits: extending semantic representation to circuits with classical flow or adaptive measurements.
  • Cache-aware workload placement and optimizer-integration: potential for integrating cache awareness into scheduler heuristics or optimizer sampling strategies.
  • Heterogeneous mix of backend types: unifying cache semantics across CPUs, GPUs, and future large-scale QPUs.

Conclusion

The Quantum Circuit Cache establishes semantic circuit equivalence as a powerful, scalable mechanism for suppressing redundant quantum computation in distributed, hybrid classical-quantum environments. Empirical results in wire cutting and variational quantum optimization demonstrate strong hit rates and speedups up to 11.2×11.2\times on real hardware. The approach is portable, backend-independent, and highly effective even in the presence of structural noise and compiler randomness.

As hybrid and distributed quantum-classical workflows become the standard for near-term quantum computing, systems techniques such as semantic caching will be crucial. Future application domains—including quantum error mitigation, quantum chemistry, and large-scale benchmarking—stand to benefit significantly from this paradigm. The methodology generalizes broadly, opening avenues for further research into architecture-level integration, cache-coherent quantum-classical scheduling, and algorithm-cache co-design.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.