Generative Quantum Eigensolver (GQE)

Updated 8 January 2026

Generative Quantum Eigensolver (GQE) is a hybrid method that employs transformer-based generative models to autonomously design quantum circuits for accurate ground-state energy estimation.
It reframes circuit synthesis as an autoregressive discrete-sequence generation problem, enabling effective transfer learning and robust scalability across various quantum systems.
Advanced training protocols combining reinforcement learning and preference optimization deliver significant resource savings and faster convergence compared to traditional VQE methods.

The Generative Quantum Eigensolver (GQE) is a hybrid quantum-classical framework that deploys deep generative models—especially transformer-based neural networks—to autonomously synthesize quantum circuits optimized for ground-state energy estimation and related tasks. Unlike traditional VQE, which directly optimizes continuous circuit parameters, GQE frames circuit construction as an autoregressive discrete-sequence generation problem, enabling the application of transfer learning, preference-based objective functions, and scale-out generalization over families of quantum systems. Recent extensions include context-aware conditional models, advanced loss functions, and operator representations inspired by computational chemistry.

1. Conceptual Foundations and Architectural Overview

GQE replaces variational parameter optimization in VQE with the training of a classical generative model that outputs sequences of quantum operators or gate tokens from a predetermined operator pool. Formally, given an operator pool $G = \{U_j\}_{j=1,\dots,L}$ , a quantum circuit is encoded as a sequence $\vec{j} = (j_1, \dots, j_N)$ , and the corresponding physical unitary is $U_N(\vec{j}) = U_{j_N}\cdots U_{j_1}$ (Nakaji et al., 2024). The generative model $p_\theta(\vec{j})$ is implemented as a transformer decoder, sampling the next token $j_k$ conditioned on the previously generated sequence.

Energy evaluation is performed by applying the sampled circuit to a reference state $|\phi_\mathrm{ini}\rangle$ and measuring the expectation value of the system Hamiltonian $H$ :

$E(\vec{j}) = \langle\phi_\mathrm{ini}|U_N(\vec{j})^\dagger H U_N(\vec{j})|\phi_\mathrm{ini}\rangle.$

Model parameters $\theta$ are updated through reinforcement or preference-based learning protocols, using the energy as a reward signal.

Key architectural choices include:

Use of autoregressive neural networks (e.g., GPT-2 style transformers with causal masking, embeddings, and up to 12 layers/heads) (Nakaji et al., 2024, Minami et al., 28 Jan 2025, Yin et al., 24 Sep 2025).
The token vocabulary is tailored to problem domain: e.g., Pauli terms, UCCSD excitations, discrete time steps.
Training loop combines generative sampling, quantum/classical energy measurement, and objective-driven updates.

This framework avoids barren-plateau issues intrinsic to deep parametrized ansätze and enables more expressive and transferable circuit constructions.

2. Operator Encoding and Transfer Learning

A central innovation is the discrete text-based representation of quantum operators suited for generative modeling and transfer learning. For quantum chemistry applications (e.g., UCCSD ansatz), each single or double excitation is mapped onto a canonical token—such as $S(p,q)$ for single and $D(p,q,r,s)$ for double excitations—mirroring the SMILES notation for molecular structures (Yin et al., 24 Sep 2025). For instance, the H $_2$ molecule is encoded as:

$s_\mathrm{H2} = \left[ S(0,1), D(0,1,2,3), S(0,1) \right],$

where the circuit comprises tokenized excitations in a specified order.

Text similarity metrics (cosine and normalized Levenshtein) are deployed to quantify operator overlap between source and target molecular systems, guiding the transfer of learned weights between models. The pipeline involves:

Pretraining GQE on a source molecule to learn $\theta_\mathrm{src}$ .
Mapping the source operator pool to the target via maximal similarity.
Weight transfer and resizing in the transformer’s embedding/output layers.
Optionally, fine-tuning on target Hamiltonian evaluations with an $L_2$ regularization to prevent catastrophic forgetting.

This methodology yields substantial computational savings when molecular systems share operator vocabularies or structural motifs (Yin et al., 24 Sep 2025).

3. Objective Functions and Training Protocols

GQE utilizes a spectrum of loss functions derived from reinforcement learning and structured preference optimization:

REINFORCE-style Monte Carlo updates minimize the expected energy across the model’s output distribution:

$\nabla_\theta E(\theta) \simeq \frac{1}{N}\sum_{k=1}^N \left( E(s^{(k)}) - b \right) \nabla_\theta \log p_\theta(s^{(k)}),$

where $b$ is a baseline for variance reduction (Yin et al., 24 Sep 2025).

Logit-matching losses align transformer logits to circuit energies, often via exponential weighting to induce a Gibbs-like sampling focus on low energies (Nakaji et al., 2024).
Direct Preference Optimization (DPO) and its persistent variant (P-DPO) enforce pairwise energy orderings and avoid vanishing gradients, with P-DPO providing a tunable lower bound on gradient weights for continual learning of high-quality circuit samples. The P-DPO loss is:

$\mathcal{L}_{\mathrm{P\hyp DPO}} = - \mathbb{E}_{(\vec{j}_w, \vec{j}_l) \sim \mathcal{B}} \left[ \alpha\,z + (1-\alpha)\,\log\sigma(z) \right],$

where $z$ encodes preference margins, $\sigma$ is the logistic sigmoid, and $\alpha$ interpolates between margin and log-sigmoid objectives (Nakamura et al., 10 Sep 2025).

Hybrid online/offline learning strategies incorporate persistent replay buffers to stabilize convergence and accelerate the discovery of low-energy circuits.

4. Conditional and Context-Aware Models

Recent advances generalize GQE to conditional-generation tasks:

Conditional-GQE injects problem-specific context vectors (e.g., combinatorial optimization coefficients $J_{ij}, h_i$ or molecular features) via encoder–decoder transformer architectures (Minami et al., 28 Jan 2025).
The encoder processes instance features into latent context vectors, attended by the decoder at every generation step.
Mixture-of-Experts (MoE) layers and curriculum training facilitate scalability to larger qubit counts, specializing subcomponents of the model for different problem sizes.

Inference on entirely unseen problem instances or Hamiltonians is supported natively by the conditional architecture, with empirical success rates of ≈99% for ground-state bitstring identification on new graphs up to 10 qubits (Minami et al., 28 Jan 2025).

5. Krylov Subspace Generation and Classical Sampling

Generative models have been extended to Krylov-subspace quantum diagonalization (KQD/SKQD). Instead of executing numerous quantum circuits for each Hamiltonian/subspace, the classical generative model $p_\theta(m|H,t)$ learns the measurement statistics, producing synthetic bitstring samples for arbitrary subspace dimensions and new Hamiltonians (Lee et al., 22 Dec 2025). The inference workflow:

Train on (bitstring, Hamiltonian, time) tuples.
Sample measurement outcomes conditionally to reconstruct overlap and Hamiltonian matrices required for solving:

$H\,c = \lambda\,S\,c,$

yielding ground-state energy approximations.

Transformer and linear-time Mamba networks are both viable backbones for such models, with empirical validation on large-scale simulation and hardware datasets (IBM Q 20-qubit). Performance is competitive with direct quantum sampling, with Transformer models occasionally surpassing quantum hardware in error metrics at greater subspace depths (Lee et al., 22 Dec 2025).

6. Computational Performance and Resource Savings

GQE and its variants deliver order-of-magnitude reductions in quantum circuit evaluations, training time, and convergence speed compared to baselines:

SMILES-inspired GQE transfer achieves ∼100× faster inference (30 s vs. 3 600 s for naive baseline), with RMSE of $0.025$ Ha versus $0.020$ Ha for H $_2\rightarrow$ LiH transfers (Yin et al., 24 Sep 2025).
Conditional-GQE solves Max-Cut on 10-node IonQ hardware in a single shot, outperforming QAOA which requires $>100$ runs (Minami et al., 28 Jan 2025).
Krylov-generative models eliminate the need for repeat quantum experiments for new problem instances, enabling sample generation and energy estimation entirely on classical hardware (Lee et al., 22 Dec 2025).
Persistent-DPO and hybrid replay buffers facilitate faster and more robust convergence to target energies, reducing variance and dependence on hyperparameters (Nakamura et al., 10 Sep 2025).

7. Limitations, Extensions, and Future Directions

Expressivity is bounded by the operator pool and circuit depth. For highly dissimilar molecular systems or Hamiltonians, advanced transfer metrics and selective fine-tuning may be required. Deep subspace extrapolation can amplify generative biases if the classical model diverges from quantum statistics, so sufficient coverage in the training data is crucial. Integration of physics-informed regularization and error-mitigated quantum data is a potential avenue for cleaner generative representations (Lee et al., 22 Dec 2025).

Extensions include:

Excited-state targeting via multi-context conditional generation.
Hybrid approaches combining generative circuit construction with continuous VQE parameter refinement.
Generalization to PDE solvers, molecular dynamics, and classical simulation tasks via encoder adaptation.

Persistent margin-based preference losses (P-DPO) and transfer frameworks inspired by SMILES open a pathway to scalable, robust, and resource-efficient quantum simulation across broad problem domains.

Selected Quantitative Results

Scenario	Baseline RMSE (Ha)	GQE/Transfer RMSE (Ha)	Resource Savings
H₂→LiH (SMILES-GQE)	0.020	0.025	~100× faster convergence
H₂→BeH₂ (SMILES-GQE)	0.035	0.038	~100× faster convergence
15-qubit Heisenberg	2.48–1.06	2.85–0.92 (Mamba)	Classical-only inference
Max-Cut (10-node, IonQ)	—	99% success	1 shot vs >100 (QAOA)

Markdown Report Issue Upgrade to Chat

References (5)

The generative quantum eigensolver (GQE) and its application for ground state search (2024)

Generative quantum combinatorial optimization by means of a novel conditional generative quantum eigensolver (2025)

SMILES-Inspired Transfer Learning for Quantum Operators in Generative Quantum Eigensolver (2025)

Persistent-DPO: A novel loss function and hybrid learning for generative quantum eigensolver (2025)

Generative Krylov Subspace Representations for Scalable Quantum Eigensolvers (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generative Quantum Eigensolver (GQE).

Generative Quantum Eigensolver (GQE)

1. Conceptual Foundations and Architectural Overview

2. Operator Encoding and Transfer Learning

3. Objective Functions and Training Protocols

4. Conditional and Context-Aware Models

5. Krylov Subspace Generation and Classical Sampling

6. Computational Performance and Resource Savings

7. Limitations, Extensions, and Future Directions

Selected Quantitative Results

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Generative Quantum Eigensolver (GQE)

1. Conceptual Foundations and Architectural Overview

2. Operator Encoding and Transfer Learning

3. Objective Functions and Training Protocols

4. Conditional and Context-Aware Models

5. Krylov Subspace Generation and Classical Sampling

6. Computational Performance and Resource Savings

7. Limitations, Extensions, and Future Directions

Selected Quantitative Results

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research