Parallel Quantum Annealing (PQA)
- PQA is a quantum optimization method that simultaneously embeds and anneals multiple independent QUBO or Ising subproblems on a single hardware cycle.
- It maximizes qubit utilization by partitioning the hardware graph, thereby reducing time-to-solution while balancing throughput against marginal declines in solution accuracy.
- Applications include combinatorial optimization, machine learning, and error mitigation, with advanced embedding strategies enhancing performance and scalability.
Parallel Quantum Annealing (PQA) is an advanced quantum optimization paradigm wherein multiple independent Quadratic Unconstrained Binary Optimization (QUBO) or Ising-type subproblems are simultaneously embedded and annealed within a single quantum annealing cycle on a quantum hardware platform. PQA maximizes utilization of available qubits by partitioning the hardware graph to accommodate several problem instances in parallel, thus increasing throughput and reducing the cumulative Time-to-Solution (TTS), often at a marginal decline in individual solution quality. Key applications include combinatorial optimization, error mitigation, machine learning, and hybrid classical–quantum workflows, leveraging both hardware-specific strategies and generalized parallelization principles to address scalability constraints of current noisy intermediate-scale quantum (NISQ) annealers (Bishwas et al., 2024, Pelofske et al., 2021, Pelofske et al., 2022, Artag et al., 10 Mar 2026).
1. Mathematical Formulation and Hamiltonian Structure
In PQA, the quantum annealer evolves a composite Hamiltonian that encodes multiple problem instances. For independent QUBO or Ising problems with Hamiltonians , the total annealing Hamiltonian is: where and denote the annealing schedule, is the transverse-field driver, and parametrizes the schedule (Bishwas et al., 2024, Artag et al., 10 Mar 2026). In QUBO form:
For balancing disparate problem scales, instance-specific normalization coefficients can be applied: 0 to avoid dominance by larger-magnitude subproblems (Bishwas et al., 2024).
Spatially, each logical problem is minor-embedded in a disjoint subgraph of the hardware graph—chains associated with logical variables are strictly non-overlapping across problems. The adiabatic spectrum of the block-diagonal 1 satisfies: 2 where 3 is the instantaneous spectral gap for the 4th problem alone. This guarantees that the overall system does not experience increased computational complexity compared to the hardest constituent instance (Artag et al., 10 Mar 2026).
2. Embedding Strategies and Hardware Considerations
Effective PQA requires embedding multiple QUBO or Ising instances onto the hardware map with minimal qubit and coupler contention:
- Default Embedding: D-Wave’s cloud solvers compute a minor embedding for the parallel-constructed QUBO, setting chain strength automatically. This approach is straightforward yet offers limited control and can experience quality degradation for heterogeneous or moderate-sized QUBOs (Bishwas et al., 2024).
- Custom Embedding: Logical variables are manually mapped to non-overlapping qubit chains based on detailed hardware topology (e.g., Pegasus). This reduces chain lengths, overlaps, and chain breaks, improving both solution consistency and ground-state probability for QUBOs of moderate size (up to 523 variables) (Bishwas et al., 2024, Pelofske et al., 2021).
- Spatial Isolation: Buffer zones (unused qubits) are introduced between embedding regions to reduce cross-talk and analog interference at the cost of some embedding capacity (Artag et al., 10 Mar 2026, Schuman et al., 18 Jul 2025).
- Hybrid Solvers: LeapHybridSampler and hybrid classical–quantum decompositions partition large problems into quantum-tractable subblocks, combining classical preprocessing with QPU-based parallel annealing for substantially larger QUBO sizes (up to 6900 variables in empirical studies) (Bishwas et al., 2024, Pelofske et al., 2022).
- RBM Error Mitigation: Replication-Based Mitigation embeds 7 disjoint instances of the same problem across the chip, exploiting spatial redundancy to average out hardware biases and analog errors without explicit penalty couplings (Djidjev, 2024).
Table 1 summarizes mode, embedding method, and instance capacity based on empirical studies.
| Mode | Embedding | Typical Capacity |
|---|---|---|
| Default PQA | Minor-miner (auto) | 812–68 (size 20, Advantage) |
| Custom PQA | Manual chains | 920–23 (mid-size QUBOs) |
| Spatial-/Isolation MTQA | Manual + buffer zone | 08–130 (graph size 130) |
| Hybrid (LeapHybrid) | Classical + QA | 2900 variables (total QUBO) |
| RBM (Replication) | Partitioned islands | 3 (subgraph-wise) |
3. Normalization, Scaling, and Solution Quality
Diverse QUBO instances often vary by several orders of magnitude in their coefficient scales. To balance their contributions and prevent energy landscape dominance, empirical studies investigated eight normalization strategies:
- Element-wise root (square, fourth root)
- Logarithmic (base-10)
- Scalar multiplication (4)
- Polynomial (square, square–log, log–square)
- Problem-specific rescaling
Scalar multiplication yielded the most reliable preservation of individual solution quality (Bishwas et al., 2024). However, magnitude disparity in mixed problem batches can still yield suboptimal solution quality, as global normalization does not fully resolve the risk of spectral-gap collapse for problems with vastly different hardness (Artag et al., 10 Mar 2026).
Solution quality (measured by SQV or ground-state probability) exhibits a trade-off:
- Default embedding suffers degradation, especially for weaker-magnitude problems embedded alongside stronger ones.
- Custom embeddings and buffer-zone isolation significantly stabilize SQV and maintain near-optimal performance for moderate sizes.
- Hybrid samplers consistently return violation-free solutions up to the tested size limit.
4. Throughput, Time-to-Solution, and Performance Scaling
PQA achieves marked gains in hardware utilization and wall-clock throughput:
- Parallel runs fill 580–95% of available physical qubits, compared to 630–50% for sequential single-instance jobs (Bishwas et al., 2024, Pelofske et al., 2021).
- TTS is defined as 7, encapsulating all overheads (Bishwas et al., 2024, Pelofske et al., 2021).
- Speed-ups:
- Default embedding: 820–40% TTS reduction relative to sequential annealing for small-to-moderate problem sizes.
- Custom embedding: Up to 960% TTS reduction with maintained ground-state quality below 023 variables per batch.
- Hybrid approaches: 1 TTS reduction for large combined QUBOs (up to 900 variables).
- In full-capacity scenarios, PQA with 2 instances holds TTS approximately constant in 3 until qubit saturation, while sequential jobs scale linearly (Pelofske et al., 2021, Pelofske et al., 2022).
Empirical studies with replication-based PQA (RBM) demonstrate that solution-energy and ground-state probabilities consistently improve over standard QA, and match the performance of quantum annealing correction (QAC) in bias-planted benchmarks, without requiring penalty couplings or syndrome decoding (Djidjev, 2024).
5. Applications: Optimization, Error Mitigation, and Machine Learning
Parallel Quantum Annealing is employed in several application domains:
- Combinatorial Optimization: Maximum Clique, Minimum Vertex Cover, Traffic Flow Optimization, and Asset-Liability Modeling are mapped to parallel-disjoint minor embeddings, maximizing problem throughput (Pelofske et al., 2022, Bishwas et al., 2024, Artag et al., 10 Mar 2026).
- Hybrid Classical–Quantum Workflows: Recursive decomposition of large graphs (e.g., DBK for clique finding) breaks tasks into subproblems that are solved in parallel on the quantum device, vastly expanding effective hardware capability (Pelofske et al., 2022).
- Machine Learning: Annealing-based Quantum Boltzmann Machines integrate PQA to accelerate sampling during supervised and unsupervised learning on structured data such as MedMNIST images, achieving a 69.65% reduction in QPU time across all tested hidden-unit sizes and maintaining comparable accuracy and epoch convergence rates relative to classical and CNN counterparts (Schuman et al., 18 Jul 2025).
- Error Mitigation: Replication-based PQA (RBM) achieves statistical cancellation of analog hardware errors, exploiting spatial dispersion of problem replicas; this mitigation is hardware-agnostic for connectivity and requires no tuning of penalty terms (Djidjev, 2024).
6. Limitations, Practical Challenges, and Future Directions
PQA faces hardware and algorithmic constraints:
- Embedding Overhead: Custom, high-quality embeddings demand manual intervention and hardware knowledge; automated scalable solutions remain an open area for research (Bishwas et al., 2024).
- QUBO Size and Connectivity: On current hardware, pure-quantum PQA is limited to 425–30 logical variables per batch before ground-state probability degrades sharply, even with optimal embedding (Bishwas et al., 2024, Pelofske et al., 2021).
- Spectral Interference: If global parameterization or scaling is applied indiscriminately across instances, weaker-magnitude or higher-hardness problems can experience adiabatic gap collapse, reducing overall ground-state return probability (Artag et al., 10 Mar 2026).
- Quality–Throughput Trade-off: As the number and size of parallelized problems increase, instance solution quality (SQV, ground-state probability) generally declines beyond a threshold. Isolation layers and per-instance scheduling can partially alleviate this (Bishwas et al., 2024, Artag et al., 10 Mar 2026).
- Open Problems: Adaptive, per-instance α-weight learning, dynamic annealing schedules, integration with advanced hybrid and decomposition frameworks, and optimal error-averaging via replication invite further exploration (Bishwas et al., 2024, Djidjev, 2024, Artag et al., 10 Mar 2026).
Emerging extensions include Multi-Tasking Quantum Annealing (MTQA), which combines PQA’s parallel embedding with independent parameterization for each instance, maintaining coherence and preventing spectral-gap shrinkage across heterogeneous workloads (Artag et al., 10 Mar 2026).
7. Comparative Empirical Results and Scalability
Empirical studies across hardware and problem classes highlight PQA’s scalability and effectiveness:
- On D-Wave 2000Q/Advantage hardware, parallel embeddings of up to 5 (Chimera) or 6 (Pegasus) size-20 cliques are feasible (Pelofske et al., 2021).
- Time-to-solution reductions of 7 to 8 over sequential annealing are observed, depending on problem structure and hardware (Pelofske et al., 2021).
- For Maximum Clique, the hybrid combination of DBK decomposition and PQA enables optimal solutions for dense 120-node instances that exceed the native hardware limitation, with up to 9 speed-up compared to classical Fast Maximum Clique solvers in select regimes (Pelofske et al., 2022).
- Medical image classification experiments show PQA-based QBMs attaining comparable accuracy and AUC to classical neural networks, reaching convergence in 010 epochs with a 69.65% reduction in QPU sampling time (Schuman et al., 18 Jul 2025).
- RBM outperforms standard QA in both normalized energy and ground-state probability for both small and large native graphs, with up to 20 percentage point improvement in the latter at hard densities (Djidjev, 2024).
References
- (Bishwas et al., 2024) Investigation into the Potential of Parallel Quantum Annealing for Simultaneous Optimization of Multiple Problems: A Comprehensive Study.
- (Pelofske et al., 2021) Parallel Quantum Annealing.
- (Pelofske et al., 2022) Solving Larger Maximum Clique Problems Using Parallel Quantum Annealing.
- (Artag et al., 10 Mar 2026) Multi-tasking through quantum annealing.
- (Djidjev, 2024) Replication-based quantum annealing error mitigation.
- (Schuman et al., 18 Jul 2025) Quantum Boltzmann Machines using Parallel Annealing for Medical Image Classification.