Sequential Bayesian Optimization for QAOA Tuning

Updated 8 May 2026

The paper presents a sequential Bayesian optimization framework that uses probabilistic surrogates to efficiently tune QAOA parameters on noisy quantum devices.
It leverages Gaussian processes and adaptive acquisition functions to navigate a highly non-convex, noisy optimization landscape with fewer circuit evaluations.
The methodology demonstrates improved convergence speed and significant resource reduction compared to conventional optimizers, even under realistic noise conditions.

Sequential Bayesian optimization for QAOA parameter tuning denotes a family of hybrid optimization strategies leveraging probabilistic surrogates to efficiently search the highly non-convex, noisy landscape defined by the Quantum Approximate Optimization Algorithm (QAOA) objective. These methods are designed to minimize the quantum-classical evaluation overhead on near-term, noisy intermediate-scale quantum (NISQ) devices by sequentially selecting QAOA circuit parameters whose evaluation promises the largest expected information gain or improvement, according to a Bayesian model updated after each quantum measurement. Recent theoretical and experimental work has established both the efficiency and robustness of this approach, with precise scaling results under reasonable circuit and noise models, as well as practical enhancements that address hardware limitations and stochasticity (Song et al., 2023, Cheng et al., 2023, Tibaldi et al., 2022, Zhang et al., 30 Mar 2026).

1. Mathematical Formulation of the QAOA Landscape

For the canonical MaxCut problem, QAOA is specified by a variational ansatz parameterized by a sequence $\boldsymbol\theta = (\gamma_1, \ldots, \gamma_p, \beta_1, \ldots, \beta_p) \in \mathbb{R}^{2p}$ , acting on $n$ qubits. The objective function is typically either the expectation value

$f(\boldsymbol\gamma, \boldsymbol\beta) = \Big\langle +^{\otimes n}\Big|\,U(\boldsymbol\gamma, \boldsymbol\beta)^\dagger\,H_P\,U(\boldsymbol\gamma, \boldsymbol\beta)\,\Big|+^{\otimes n}\Big\rangle$

where $U(\boldsymbol\gamma, \boldsymbol\beta)$ is the parametrized unitary and $H_P$ the problem Hamiltonian, or alternative figures of merit defined over measurement outcomes, such as the "mode-based" (maximum-probability) cut value $C(z_{\text{mode}})$ for bitstring $z_{\text{mode}}$ with maximal observed count (Zhang et al., 30 Mar 2026). The induced optimization landscape is generically non-convex with exponentially many local optima for $p \geq 1$ .

2. Bayesian Optimization Framework

Sequential Bayesian optimization treats the QAOA objective $f(\boldsymbol\theta)$ as an unknown function to be inferred via a probabilistic surrogate updated after each parameter evaluation. The standard surrogate is a Gaussian process (GP):

$f(\boldsymbol\theta) \sim \mathcal{GP}(\mu(\boldsymbol\theta), k(\boldsymbol\theta, \boldsymbol\theta'))$

where the prior mean $n$ 0 is usually zero, and the kernel $n$ 1 is chosen based on the smoothness of the landscape—common choices being the Matérn family or squared-exponential kernels (Song et al., 2023, Tibaldi et al., 2022). The GP posterior for $n$ 2 after $n$ 3 observations $n$ 4 is given by

$n$ 5

with $n$ 6 accounting for shot and device noise (Song et al., 2023, Cheng et al., 2023). Alternative surrogates such as tree-structured Parzen estimators (TPE) are also employed for non-Gaussian or discrete-valued objective settings (Zhang et al., 30 Mar 2026).

At each iteration, a new candidate point is selected by maximizing an acquisition function such as the upper confidence bound (UCB), expected improvement (EI), or a “ratio-of-good-to-bad” density in TPE:

$n$ 7

$n$ 8

where $n$ 9 is the incumbent best value.

3. Structural Properties and Theoretical Guarantees

Efficient Bayesian optimization for QAOA in high-dimensional spaces relies on structural circuit assumptions:

Local 1-design Slices (Noiseless): If either sub-block of each QAOA layer forms a local 1-design, the objective exhibits bounded Lipschitz continuity and partial derivative variance, guaranteeing trainability for shallow circuits (Song et al., 2023).
Local Pauli Channels (Noisy): For circuits with per-gate Pauli noise ( $f(\boldsymbol\gamma, \boldsymbol\beta) = \Big\langle +^{\otimes n}\Big|\,U(\boldsymbol\gamma, \boldsymbol\beta)^\dagger\,H_P\,U(\boldsymbol\gamma, \boldsymbol\beta)\,\Big|+^{\otimes n}\Big\rangle$ 0), the optimization landscape admits a Lipschitz constant that decays exponentially in the circuit depth and noise parameter $f(\boldsymbol\gamma, \boldsymbol\beta) = \Big\langle +^{\otimes n}\Big|\,U(\boldsymbol\gamma, \boldsymbol\beta)^\dagger\,H_P\,U(\boldsymbol\gamma, \boldsymbol\beta)\,\Big|+^{\otimes n}\Big\rangle$ 1, which can in fact facilitate optimization by smoothing out local minima.

Main scaling theorems specify that, for noiseless QAOA, circuit depth $f(\boldsymbol\gamma, \boldsymbol\beta) = \Big\langle +^{\otimes n}\Big|\,U(\boldsymbol\gamma, \boldsymbol\beta)^\dagger\,H_P\,U(\boldsymbol\gamma, \boldsymbol\beta)\,\Big|+^{\otimes n}\Big\rangle$ 2 suffices for efficient ( $f(\boldsymbol\gamma, \boldsymbol\beta) = \Big\langle +^{\otimes n}\Big|\,U(\boldsymbol\gamma, \boldsymbol\beta)^\dagger\,H_P\,U(\boldsymbol\gamma, \boldsymbol\beta)\,\Big|+^{\otimes n}\Big\rangle$ 3) Bayesian optimization convergence, while for noisy QAOA, $f(\boldsymbol\gamma, \boldsymbol\beta) = \Big\langle +^{\otimes n}\Big|\,U(\boldsymbol\gamma, \boldsymbol\beta)^\dagger\,H_P\,U(\boldsymbol\gamma, \boldsymbol\beta)\,\Big|+^{\otimes n}\Big\rangle$ 4 applies under Pauli noise in the range $f(\boldsymbol\gamma, \boldsymbol\beta) = \Big\langle +^{\otimes n}\Big|\,U(\boldsymbol\gamma, \boldsymbol\beta)^\dagger\,H_P\,U(\boldsymbol\gamma, \boldsymbol\beta)\,\Big|+^{\otimes n}\Big\rangle$ 5 (Song et al., 2023).

4. Algorithmic Variants and Implementation

Several algorithmic refinements augment standard sequential Bayesian optimization:

Double Adaptive-Region Bayesian Optimization (DARBO): Incorporates a local GP surrogate fit within an adaptive trust region and manages a secondary adaptive search region to escape local optima. Posterior hyperparameters are re-learned after each quantum evaluation, and region switches modulate global versus local search balance (Cheng et al., 2023). This "TR ∩ SR" approach focuses search near current best while still permitting domain-wide escape.
Batch and Asynchronous Protocols: To mitigate slow quantum device throughput, batch proposals (via penalized acquisition or Kriging-believer) and asynchronous scheduling are employed (Tibaldi et al., 2022).
Measurement and Noise Handling: The GP noise term is scaled according to $f(\boldsymbol\gamma, \boldsymbol\beta) = \Big\langle +^{\otimes n}\Big|\,U(\boldsymbol\gamma, \boldsymbol\beta)^\dagger\,H_P\,U(\boldsymbol\gamma, \boldsymbol\beta)\,\Big|+^{\otimes n}\Big\rangle$ 6, and classical error mitigation strategies (layout benchmarking, readout calibration, zero-noise extrapolation) are integrated in practical experimental loops (Cheng et al., 2023).
Mode-Based Bayesian Optimization: Rather than optimizing for expected energy, the parameter search can target the "mode" solution quality (i.e. cut value of the most probable bitstring), with the surrogate built around this empirical objective and adaptive-shot allocation determined by statistical confidence and normalized variance thresholds (Zhang et al., 30 Mar 2026).

Pseudocode for all main variants follows a sequential loop: proposal via acquisition maximization, quantum evaluation with shot-limited measurement, dataset augmentation, posterior update, acquisition update, and stop by convergence or sample budget.

5. Practical Recommendations and Performance

Empirical and analytical studies provide the following practical guidelines:

Circuit Depth and Sample Complexity: For $f(\boldsymbol\gamma, \boldsymbol\beta) = \Big\langle +^{\otimes n}\Big|\,U(\boldsymbol\gamma, \boldsymbol\beta)^\dagger\,H_P\,U(\boldsymbol\gamma, \boldsymbol\beta)\,\Big|+^{\otimes n}\Big\rangle$ 7– $f(\boldsymbol\gamma, \boldsymbol\beta) = \Big\langle +^{\otimes n}\Big|\,U(\boldsymbol\gamma, \boldsymbol\beta)^\dagger\,H_P\,U(\boldsymbol\gamma, \boldsymbol\beta)\,\Big|+^{\otimes n}\Big\rangle$ 8 qubits, keeping depth $f(\boldsymbol\gamma, \boldsymbol\beta) = \Big\langle +^{\otimes n}\Big|\,U(\boldsymbol\gamma, \boldsymbol\beta)^\dagger\,H_P\,U(\boldsymbol\gamma, \boldsymbol\beta)\,\Big|+^{\otimes n}\Big\rangle$ 9– $U(\boldsymbol\gamma, \boldsymbol\beta)$ 0 allows polynomial scaling in the number of Bayesian iterations. The iteration count $U(\boldsymbol\gamma, \boldsymbol\beta)$ 1 needed to achieve $U(\boldsymbol\gamma, \boldsymbol\beta)$ 2-error scales as $U(\boldsymbol\gamma, \boldsymbol\beta)$ 3 (Song et al., 2023).
Measurement Efficiency: A few hundred shots per circuit evaluation suffice to suppress measurement variance below $U(\boldsymbol\gamma, \boldsymbol\beta)$ 4, and explicit modeling of shot noise in the GP surrogate prevents trust-region collapse (Cheng et al., 2023).
Noise as a Feature: Moderate levels of physically realistic Pauli noise ( $U(\boldsymbol\gamma, \boldsymbol\beta)$ 5– $U(\boldsymbol\gamma, \boldsymbol\beta)$ 6) can increase the effective depth range for efficient optimization by smoothing spurious landscape structure (Song et al., 2023).
Resource Reduction: Switching to mode-based objectives combined with adaptive shot allocation can reduce the quantum sampling budget by $U(\boldsymbol\gamma, \boldsymbol\beta)$ 7– $U(\boldsymbol\gamma, \boldsymbol\beta)$ 8 compared to fixed-expectation schemes at fixed discrete-solution accuracy, with stability to depolarizing noise up to $U(\boldsymbol\gamma, \boldsymbol\beta)$ 9 (Zhang et al., 30 Mar 2026).
Surrogate Selection: The Matérn kernel is robust to nonconvexity, with the smoothness parameter set according to estimated differentiability of the landscape (Song et al., 2023, Cheng et al., 2023). TPE offers competitive performance for black-box mode objectives (Zhang et al., 30 Mar 2026).

Table: Scaling Bounds from Trainability Analysis (Song et al., 2023)

Noise Model	Maximum Efficient Depth $H_P$ 0	Iterations $H_P$ 1 to $H_P$ 2-error
Noiseless	$H_P$ 3	$H_P$ 4
Local Pauli noise	$H_P$ 5	$H_P$ 6

6. Comparative Performance and Experimental Results

Computational experiments on structured MaxCut instances up to $H_P$ 7 and $H_P$ 8 vertices show Bayesian optimization (BO) approaches require $H_P$ 9– $C(z_{\text{mode}})$ 0 fewer circuit calls to reach a given approximation ratio compared to conventional optimizers (Adam, COBYLA, SPSA) (Cheng et al., 2023, Tibaldi et al., 2022). DARBO consistently yields smaller approximation gaps, lower run-to-run variance, and improved noise robustness. QEM-augmented DARBO achieves up to $C(z_{\text{mode}})$ 1 of the ideal improvement in superconducting-qubit tests at $C(z_{\text{mode}})$ 2, with optimal solution sampling probability increasing from baseline $C(z_{\text{mode}})$ 3 to $C(z_{\text{mode}})$ 4 post-optimization (Cheng et al., 2023).

Mode-based Bayesian optimization with adaptive shots achieves indistinguishable or superior mode accuracy with $C(z_{\text{mode}})$ 5– $C(z_{\text{mode}})$ 6 fewer total shots, preserving Pareto efficiency even in presence of depolarizing noise (Zhang et al., 30 Mar 2026). Standard BO with batch and asynchronous updates maintains its advantage under low-throughput or high-noise constraints (Tibaldi et al., 2022).

7. Outlook and Limitations

Sequential Bayesian optimization offers a principled, scalable route to tunable QAOA instances on NISQ devices, provided the circuit depth remains within rigorous polynomial regimes and the surrogate design matches the landscape structure. The presence of moderate noise can augment performance by mitigating local minima proliferation. For large $C(z_{\text{mode}})$ 7 or highly complex objective landscapes, kernel tuning, surrogate selection, and adaptive region methods become essential. Open challenges include extending trainability guarantees to arbitrary noise models, improved batch acquisition for massively parallel hardware, and integration with automated quantum error mitigation (Song et al., 2023, Cheng et al., 2023, Tibaldi et al., 2022, Zhang et al., 30 Mar 2026).

Markdown Report Issue Upgrade to Chat

References (4)

Trainability Analysis of Quantum Optimization Algorithms from a Bayesian Lens (2023)

Quantum approximate optimization via learning-based adaptive optimization (2023)

Bayesian Optimization for QAOA (2022)

Resource-efficient quantum approximate optimization algorithm via Bayesian optimization and maximum-probability evaluation (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sequential Bayesian Optimization for QAOA Parameter Tuning.

Sequential Bayesian Optimization for QAOA Tuning

1. Mathematical Formulation of the QAOA Landscape

2. Bayesian Optimization Framework

3. Structural Properties and Theoretical Guarantees

4. Algorithmic Variants and Implementation

5. Practical Recommendations and Performance

6. Comparative Performance and Experimental Results

7. Outlook and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sequential Bayesian Optimization for QAOA Tuning

1. Mathematical Formulation of the QAOA Landscape

2. Bayesian Optimization Framework

3. Structural Properties and Theoretical Guarantees

4. Algorithmic Variants and Implementation

5. Practical Recommendations and Performance

6. Comparative Performance and Experimental Results

7. Outlook and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research