Barren Plateaus in Quantum Algorithms

Updated 25 June 2026

Barren plateaus are defined as the exponential suppression of cost-function gradients in parameterized quantum circuits, resulting in nearly flat optimization landscapes.
They arise from factors like circuit expressibility, destructive interference, entanglement, and noise, with diagnostics such as cancellation ratio highlighting their effects.
Mitigation strategies include advanced initialization, tailored ansatz designs, and noise regularization, yet they only delay the exponential scaling issue without fully eliminating it.

A barren plateau (BP) in variational quantum algorithms (VQAs) is characterized by the exponential suppression of the variance of cost-function gradients with respect to the number of qubits or circuit depth. This exponential vanishing of the gradient renders parameter optimization landscapes almost featureless as the system size increases, making the efficient training of parameterized quantum circuits (PQCs) fundamentally intractable in practice. The BP phenomenon poses a critical challenge to scalable quantum machine learning, quantum simulation, and hybrid quantum–classical optimization workflows, dominating the field’s theoretical discourse and influencing ansatz design, initialization protocols, and mitigation strategies.

1. Formal Definition and Mechanisms

Barren plateaus are defined mathematically by the property that, for a cost function $C(\boldsymbol{\theta}) = \langle 0|U^\dagger(\boldsymbol{\theta})\,\mathcal{O}\,U(\boldsymbol{\theta})|0\rangle$ associated with an $n$ -qubit PQC $U(\boldsymbol{\theta})$ and observable $\mathcal{O}$ , the variance over parameter randomizations satisfies

$\operatorname{Var}[\partial_k C] \le F(n), \quad F(n)\in o(b^{-n}),\, b > 1.$

Typically, the mean gradient vanishes: $\mathbb{E}[\partial_k C] = 0$ , and the probability of observing a nonzero gradient is exponentially suppressed with increasing $n$ (or circuit depth $L$ ), as guaranteed by Chebyshev's inequality. This vanishing variance also extends to higher-order derivatives such as the Hessian and all mixed partials of the cost, precluding the efficacy of gradient- or Hessian-based optimization methods in the BP regime (Cerezo et al., 2020).

The mechanistic origin of BP in most expressive PQCs is closely tied to the emergence of approximate unitary $2$-designs in the circuit ensemble. In randomly initialized, sufficiently deep PQCs, the distribution of unitaries quickly approaches the Haar measure, so that expectation values and their derivatives “concentrate” (concentration of measure) around zero exponentially with $n$ . This effect is also modulated by the entanglement properties of the state, the locality of the observable, the Lie algebra generated by the gates, and noise in the quantum hardware (Larocca et al., 2024, Ragone et al., 2023).

2. Destructive Interference and Diagnostic Frameworks

Recent advances reframe the BP phenomenon through the lens of destructive interference among termwise gradient contributions. For Hamiltonians decomposed as $n$ 0, and for gradient components $n$ 1, the net gradient $n$ 2 can be made exponentially small not because each $n$ 3 is small individually, but due to random sign cancellations across terms ("random-sign cancellation regime"). Quantitative diagnostics include:

Cancellation ratio $n$ 4
Effective term count $n$ 5
Interference-quality measure $n$ 6

An ensemble in the random-sign cancellation regime typically shows $n$ 7 (for random $n$ 8 with equal probability), with the mean gradient norm vanishing exponentially with system size. Crucially, this framework distinguishes between suppression caused by cancellation and that arising from pre-cancellation activity scale $n$ 9 (Kang, 2 May 2026).

3. Origins: Expressibility, Locality, Entanglement, and Noise

The emergence of BPs in VQAs can be attributed to several interrelated factors:

Circuit expressibility: Hardware-efficient, unstructured ansatzes with depth $U(\boldsymbol{\theta})$ 0 typically generate large dynamical Lie algebras ( $U(\boldsymbol{\theta})$ 1) and quickly approach a 2-design, enforcing exponential suppression of gradient variance (Ragone et al., 2023, Larocca et al., 2024).
Observable and cost locality: Global observables ( $U(\boldsymbol{\theta})$ 2 acting nontrivially on $U(\boldsymbol{\theta})$ 3 qubits) inevitably cause BPs at any circuit depth. In contrast, strictly local costs can avoid BPs if the circuit is sufficiently shallow ( $U(\boldsymbol{\theta})$ 4) (Gelman, 2024, Cunningham et al., 2024).
State entanglement: Volume-law entangled input states can push even local observables into BP regimes via the pre-formation of maximal mixedness in local reductions (Ragone et al., 2023, Sack et al., 2022).
Noise: Unital noise channels (e.g., depolarizing) induce deterministic flattening ("noise-induced BPs"), driving both expectation values and gradients to their maximally mixed values exponentially in depth (Sannia et al., 2023, Zapusek et al., 2 Jul 2025).

4. Consequences for Optimization and Trainability

In the BP regime, the required number of measurement shots or function evaluations to estimate gradients with any fixed precision scales exponentially with the number of qubits. This affects all derivative-based strategies, including higher-order methods, as not only are the gradients exponentially small, but so too are the Hessians and all higher derivatives—thereby creating a fundamental obstruction to efficient training (Cerezo et al., 2020).

Attempts to escape BPs by exploiting Hessian structure, high-order derivatives, or "jumping" to stochastic noise for escaping plateaus have been shown analytically to not circumvent the exponential cost (Cerezo et al., 2020, Anschuetz et al., 2022). Even initialization or ansatz selection strategies that boost initial gradient norms can merely shift, but not fundamentally remove, the exponential scaling (Kulshrestha et al., 16 Jun 2026, Kashif et al., 2024, Li et al., 19 Mar 2026).

Moreover, it has been demonstrated that even in shallow circuits that avoid BPs, the loss landscape is “swamped with traps”—an overwhelming number of poor local minima, where most parameter choices fail to escape suboptimal configurations. In the BP regime, exponentially many trivial local minima persist, optimally solving only a few cost-function terms while leaving the rest flat (Nemkov et al., 2024, Anschuetz et al., 2022). This establishes that non-vanishing gradients alone are not sufficient for trainability.

5. Mitigation Strategies and Their Limitations

A variety of methods have been developed to delay or sidestep barren plateaus. These can be organized into several categories (Cunningham et al., 2024, Gelman, 2024):

Strategy Category	Representative Methods	Principle
Initialization	Small-angle (narrow range) init, classical-network seeding, LLM-driven bootstrapping, first-moment engineering (Kashif et al., 2024, Friedrich et al., 2022, Zhuang et al., 17 Feb 2025, Kulshrestha et al., 16 Jun 2026)	Avoid early 2-design formation; steer toward high-gradient regions
Optimization Procedure	Layerwise/blockwise growth, staged/rolling training, classical shadows for entropy monitoring (Nádori et al., 2024, Sack et al., 2022)	Keep circuit expressibility limited during optimization
Model/Architecture	Problem-aligned ansatz (e.g., HVA/QAOA), symmetry-restricted, local-depth circuits (Kang, 2 May 2026, Cunningham et al., 2024, Ragone et al., 2023)	Align generator structure with physics; reduce effective Lie algebra dimension
Regularization/Noise	Engineered dissipation (Markovian loss), entanglement penalties, reset protocols (Sannia et al., 2023, Zapusek et al., 2 Jul 2025)	Locally inject purity, break up expressibility, penalize volume-law entanglement
Measurement/Hybrid	Post-selection, classical-quantum hybrids, intermediate measurement (Cunningham et al., 2024)	Conditioning on structured subspaces

While these approaches can delay the onset of BPs or partially restore trainability in regimes of moderate $U(\boldsymbol{\theta})$ 5, none are universal. In particular, mitigation strategies that lift the average variance may bias the optimizer toward different “trainable pockets” of the landscape, and exponentially many first-moment-distinct initializations exist, yielding non-equivalent minima (Kulshrestha et al., 16 Jun 2026). For problems requiring global exploration (e.g., learning unknown scrambling unitaries), BPs remain unavoidable, representing a fundamental resource barrier (Holmes et al., 2020).

Dimension is a further aggravating factor: for qudit PQCs, the gradient variance scales as $U(\boldsymbol{\theta})$ 6, amplifying the plateau problem with increased local dimension (Friedrich et al., 2024).

6. Unified Theoretical Characterization

A unified theoretical framework ties the sources of BPs to the dynamical Lie algebra $U(\boldsymbol{\theta})$ 7 generated by the circuit’s gate set (Ragone et al., 2023). The variance of the loss is given by

$U(\boldsymbol{\theta})$ 8

where $U(\boldsymbol{\theta})$ 9 is the $\mathcal{O}$ 0-purity of $\mathcal{O}$ 1. Gradient suppression arises due to (i) large $\mathcal{O}$ 2 (expressivity), (ii) small $\mathcal{O}$ 3 (input-state entanglement), (iii) small $\mathcal{O}$ 4 (observable locality/non-alignment), or (iv) noise-induced purity loss.

This framework accommodates and extends all known origins of BPs, and provides an exact predictive theory for whether a given architecture, cost, and initial state will admit trainable gradients (Ragone et al., 2023).

7. Open Problems and Future Directions

Key open problems include:

Formalizing the interplay between destructive interference and expressibility/Lie-algebraic mechanisms in more general Hamiltonians, cost functions, and physical architectures (Kang, 2 May 2026, Li et al., 19 Mar 2026).
Systematically designing ansatz architectures that balance expressibility against trainability, possibly guided by Lie algebra size, $\mathcal{O}$ 5-purity, and first-moment diagnostics.
Rigorous development of initialization strategies that selectively break average-case concentration without introducing exponential ambiguity in the optimal pocket (Kulshrestha et al., 16 Jun 2026).
Quantifying the relationship between the absence of BPs and classical simulability, the effect of hardware-specific non-unital noise, and the interplay of BPs with local minimum “traps” (Larocca et al., 2024, Nemkov et al., 2024).
Connecting classical shadow-based entropy-monitoring and real-time regularization with hardware-realizable error-mitigation techniques (Sack et al., 2022, Sannia et al., 2023, Zapusek et al., 2 Jul 2025).
Generalizing strategies to the qudit setting, high-dimensional architectures, and hybrid quantum-classical models (Friedrich et al., 2024).