2000 character limit reached

Distributed Variational Quantum Eigensolvers

Updated 9 November 2025

Distributed Variational Quantum Eigensolvers (DVQE) are hybrid quantum-classical algorithms that partition state preparation, ansatz circuits, and cost function evaluation across multiple QPUs.
They employ strategies like symmetry splitting and mutual-information clustering to reduce qubit requirements and circuit depth while preserving computational accuracy.
Empirical benchmarks reveal that DVQE achieves near chemical accuracy and efficient optimization for models like the Hubbard and QUBO problems, enhancing scalability.

Distributed Variational Quantum Eigensolvers (DVQE) are a family of hybrid quantum-classical algorithms designed to overcome the hardware bottlenecks of near-term quantum computers. By partitioning the state preparation, ansatz circuit, and cost function evaluation between multiple quantum processing units (QPUs) or between classical and quantum subsystems, DVQE enables efficient quantum simulations and combinatorial optimization at problem scales inaccessible to monolithic architectures. The DVQE paradigm encompasses circuit-level distribution across networked QPUs, as well as algorithmic splitting of the Hilbert space based on symmetry or entanglement clustering. This article surveys the foundational models, algorithmic components, resource trade-offs, and empirical benchmarks underpinning DVQE as established in recent literature.

1. Mathematical Foundation and Partitioned Cost Function

At the core of DVQE lies a formal decomposition of the variational energy to enable hybrid or distributed evaluation. For a general fermionic Hamiltonian,

$H = \sum_{\mu\mu'} t_{\mu\mu'} c^\dagger_\mu c_{\mu'} + \sum_{\mu\nu\mu'\nu'} v_{\mu\nu\mu'\nu'} c^\dagger_\mu c^\dagger_\nu c_{\nu'} c_{\mu'},$

the set of modes $M$ is split into disjoint subsets $A$ (classical or QPU $_1$ ) and $B$ (quantum or QPU $_2$ ), producing $H = H_A + H_B + H_{AB}$ , where inter-block terms encode coupling across the partition.

Trial states are written in the block basis,

$|\Psi\rangle = \sum_{n_A} \alpha_{n_A} C_{n_A}^\dagger |\Psi_{n_A}\rangle_B,$

so that the variational energy takes the form

$E[\alpha, \beta] = \sum_{n_A, n'_A} \alpha_{n'_A}^* \alpha_{n_A} \langle n'_A| H_A |n_A\rangle \langle\Psi_{n'_A}|\Psi_{n_A}\rangle + \sum_{n_A} |\alpha_{n_A}|^2 \langle\Psi_{n_A}| H_B | \Psi_{n_A} \rangle + \mbox{cross terms}.$

Terms involving only $A$ are evaluated classically, while those involving $B$ (and $H_{AB}$ ) are estimated via quantum subroutines. This hybrid cost function generalizes to Ising-type Hamiltonians for QUBO optimization, where

$H = \sum_{i<j} J_{ij} Z_i Z_j + \sum_{i} h_i Z_i,$

and each term is assigned to the logical QPU(s) hosting the corresponding qubits (Stenger et al., 2021, Hasanzadeh et al., 24 Aug 2025).

2. Partitioning Strategies: Symmetry, Clustering, and Hardware Decomposition

DVQE implementations leverage problem structure and hardware topology for partitioning:

Symmetry splitting (e.g., Hubbard model): Degrees of freedom such as spin ( $S_z$ ), particle number, or momentum are fixed in one block (e.g., $N_\uparrow$ on $A$ ), reducing quantum workload. E.g., for $L$ Hubbard sites at half-filling, $|B_{\text{configs}}| = \binom{L}{N_\uparrow}$ ; only the $B$ sector (e.g., spin-down) is handled by the quantum device, reducing qubit requirements from $2L$ to roughly $L+1$ (Stenger et al., 2021).
Mutual-information clustering (ClusterVQE): Qubits are grouped into clusters maximally entangled internally but weakly with the rest, via minimization of inter-cluster quantum mutual information. This yields a partition $M$ of size- $n_i$ clusters $C_i$ ; the global Hamiltonian is mapped to a “dressed” form allowing each cluster to be simulated by a separate device, with inter-cluster coupling handled algebraically (Zhang et al., 2021).
Distributed QPU architectures: The qubit register is split across modules (e.g., $N/2$ qubits per module for $M=2$ ), with cross-module entanglement enacted infrequently through a limited set of interconnect gates (e.g., remote $ZZ(\phi)$ gates). Primitive operations such as “TeleGate” extend the permitted two-qubit gate set to cross-QPU CNOTs using Bell pairs and classical control (Hasanzadeh et al., 24 Aug 2025, Khait et al., 2023, Hasanzadeh et al., 5 Nov 2025).

3. Distributed Ansatz Construction and Measurement

Ansatz circuits in distributed VQE are tailored to partitioning:

Blockwise ansatz (symmetry/shearing): For each classical configuration label $n_A$ , a parameterized quantum circuit $U(\theta_{n_A})$ prepares $|\Phi_{n_A}(\theta_{n_A})\rangle \approx |\Psi_{n_A}\rangle_B$ on the quantum register (Stenger et al., 2021). Hardware-efficient circuits with single-qubit rotations and nearest-neighbor CNOT ladders are typical.
Cluster-local ansatz: Each cluster $C_i$ hosts a variational subcircuit $U_{c_i}(\theta_{c_i})$ . Only intra-cluster entanglers are physically realized, while inter-cluster entanglers are “commuted out” and absorbed into the dressed cluster Hamiltonian (Zhang et al., 2021).
Distributed cross-QPU entanglement: Distributed ansatz circuits are constructed such that all local gates act on individual QPUs, while inter-QPU CNOT or $ZZ$ gates use a TeleGate or remote-operator protocol (Hasanzadeh et al., 24 Aug 2025, Khait et al., 2023). This guarantees the exact quantum state (up to idle ancillas) as a monolithic circuit.

Measurement routines are similarly modular:

Operators are measured locally where possible; cross-partition observables require joint measurement protocols, e.g. ancilla-based overlap estimation or classical post-processing of simultaneous measurement outcomes.

4. Classical Optimization and Metaheuristic Initialization

The optimization loop of DVQE leverages both standard and advanced initializers and optimizers:

Gradient evaluation utilizes parameter-shift rules or finite-difference methods, with all gradient calculations distributed according to physical partition. For ClusterVQE, a gradient formula exploits the structure of the dressed Hamiltonian and does not require ancillary qubits (Zhang et al., 2021).
Metaheuristic parameter initialization (e.g. Black-Hole, Gray Wolf, Artificial Bee Colony) precedes ADAM-based gradient updates, improving convergence speed and reducing likelihood of local minima. Warm-started ADAM typically lowers total iteration count by more than 50% relative to random initialization (Hasanzadeh et al., 24 Aug 2025, Hasanzadeh et al., 5 Nov 2025).
Hybrid safeguards for stability: “Accept-if-better” rules enforce monotonic decrease of the local augmented Lagrangian in hybrid ADMM-DVQE settings: quantum updates are only adopted if the new state does not increase the objective, stabilizing convergence even in the presence of quantum noise (Hasanzadeh et al., 5 Nov 2025).

5. Resource Scaling, Overhead, and Scalability

Resource requirements and scaling for DVQE exhibit several key features:

Approach	Qubits / Register	Circuit Depth per Block	# Quantum Evaluations per Iter
Monolithic VQE	$2L$ (Hubbard)	$O(D2L)$	$O(L^4)$
Symmetry-split DVQE	$L+1$ (w/ancilla)	$O(DL)$	$\|B_{\text{configs}}\|O(L)$
ClusterVQE	$n_i$ per QPU	$O(Dn_i)$	$O(\text{\#clusters}\cdot\text{\#shots})$
Distributed QUBO-DVQE	$n/m + 1$ per QPU	$O(d\,n/m)$	$O(\text{\#batches}\cdot\text{\#terms})$

Reduction in quantum width: By partitioning, the number of physical qubits per device is typically halved or better, e.g., $L+1$ for 4-site Hubbard (vs $2L=8$).
Measurement overhead: The classical post-processing cost (e.g., effective Hamiltonian assembly, eigenproblems) and quantum measurement count increases linearly with the number of block/configs or clusters.
Distributed execution: DVQE scales horizontally with the number of QPUs, allowing execution of larger problems as more modules are available. The TeleGate communication cost grows with the number of cross-QPU gates but remains subdominant if the gate count remains small (e.g., $n_i=3$ remote entanglers sufficing for near-monolithic performance) (Khait et al., 2023).
Batching for hardness balance: QUBO instances are partitioned into block-diagonal batches with balanced hardness, preventing performance bottlenecks (Hasanzadeh et al., 5 Nov 2025).

6. Empirical Results and Applications

Numerical evidence across models and implementations confirms the effectiveness of DVQE:

Hubbard model (4-site, half-filling): DVQE with $Q=4$ qubits and $|B_{\text{configs}}|=6$ achieves ground-state energy errors $\lesssim 10^{-3}t$ for $U/t>2$ , orders of magnitude superior to mean-field ( $\gtrsim 0.5 t$ ) (Stenger et al., 2021).
Molecular electronic structure: ClusterVQE matched or exceeded ADAPT-VQE and iQCC in accuracy, realizing energies within chemical accuracy on noisy simulators and hardware, with circuit-depth and qubit-count reductions (Zhang et al., 2021).
QUBO benchmarks: Distributed DVQE (python package: raiselab) achieved state fidelity $1$ vs monolithic and exact results for up to 10 qubits, with identical energy convergence and histogram modes hitting known minima (Hasanzadeh et al., 24 Aug 2025).
Unit commitment (UC): In hybrid ADMM-DVQE, block-diagonal QUBOs of size up to 360 qubits (batches) solved on distributed simulators matched brute-force solutions, improved convergence (40% faster in runtime), and enabled horizontal scaling for larger instances ( $N=50$ , $T=24$ ) (Hasanzadeh et al., 5 Nov 2025).
Two-module architectures: Benchmarks on TFIM, XYZ, and Heisenberg chains with $N=12$ and $n_i\leq3$ remote gates per shot achieved relative energy errors down to $10^{-4}$ , comparable to all-to-all monolithic devices (Khait et al., 2023).

7. Generalizations, Limitations, and Future Directions

The DVQE paradigm is directly extensible:

General Hamiltonians: Any fermionic or spin Hamiltonian with exploitable symmetry structure or modular entanglement can be split into classical/quantum, cluster, or multi-QPU partitions, e.g. lattice models (Heisenberg, $t$ – $J$ ), quantum chemistry (active space splitting) (Stenger et al., 2021, Zhang et al., 2021).
Increasing QPU count: Horizontal scaling allows arbitrarily large instances to be tackled as hardware modules proliferate.
Open challenges: Hardware realizations of TeleGate protocols rely on high-fidelity entanglement and minimal classical latency; error-mitigation strategies are required for practical devices. The overhead of metaheuristic initialization grows with ansatz parameter size but remains subexponential compared to problem dimension (Hasanzadeh et al., 24 Aug 2025, Hasanzadeh et al., 5 Nov 2025).
Applications: Modular and distributed DVQE is applicable as a plug-in solver for QUBO optimization in operational research (e.g., energy scheduling), quantum chemistry, and lattice simulations with realistic hardware constraints.

In conclusion, Distributed Variational Quantum Eigensolvers provide a unified and scalable approach to quantum simulation and optimization on both current and emerging distributed architectures, delivering reductions in quantum resource requirements at polynomial (and controllable) increases in classical and quantum measurement overhead. This framework facilitates incremental progress toward quantum advantage as networked modular quantum devices become available.