Quantum Neural Tangent Kernel-UCB

Updated 7 January 2026

QNTK-UCB is a kernel-based online learning algorithm that uses quantum circuits to efficiently tackle sequential decision-making with sublinear regret.
The method leverages a static Quantum Neural Tangent Kernel, benefiting from rapid eigenvalue decay for implicit regularization and improved parameter efficiency.
Empirical results show that QNTK-UCB outperforms classical baselines in sample complexity and exploration efficiency on quantum-native decision tasks.

The Quantum Neural Tangent Kernel-Upper Confidence Bound (QNTK-UCB) is a kernel-based online learning algorithm designed for sequential decision-making in bandit and Bayesian optimization settings, leveraging parameterized quantum circuits. QNTK-UCB harnesses the properties of the Quantum Neural Tangent Kernel (QNTK), a quantum analogue of the classical neural tangent kernel, to achieve improved parameter efficiency, stability, and inductive bias compared to both classical and standard quantum kernel methods. Its theoretical and empirical properties allow for sublinear regret in contextual bandits, with explicit advantages in sample complexity and effective dimensionality in regimes natural to quantum hardware (Shirai et al., 2021, Huang et al., 6 Jan 2026).

1. Definition and Construction of the Quantum Neural Tangent Kernel

The QNTK is derived by considering a parameterized quantum circuit (QNN) acting on $m$ qubits, with trainable parameters $\bm\theta\in\mathbb{R}^p$ , and classical input $x\in\mathcal{X}\subset\mathbb{R}^d$ . The model computes

$f(x; \bm\theta) = \frac{1}{N(m)}\sum_{k=1}^m \left\langle 0^m\left| U^\dagger(\bm\theta, x)\, \mathcal{O}_k\, U(\bm\theta, x)\right|0^m \right\rangle,$

where $\mathcal{O} = \sum_{k=1}^m \mathcal{O}_k$ is a sum of local observables, each $\mathcal{O}_k$ traceless with eigenvalues $\pm 1$ , and $N(m)$ is for normalization. Training is initialized at random parameter $\bm\theta_0$ .

The QNTK arises from a first-order (“tangent”) expansion about this initialization. Defining the quantum feature map

$\phi(x) = \frac{1}{\sqrt{N_K(m)}}\, \nabla_{\bm\theta} f(x;\bm\theta_0) \in \mathbb{R}^p,$

the empirical kernel is

$\bm\theta\in\mathbb{R}^p$ 0

As $\bm\theta\in\mathbb{R}^p$ 1, $\bm\theta\in\mathbb{R}^p$ 2 concentrates around a deterministic analytic kernel $\bm\theta\in\mathbb{R}^p$ 3 (Shirai et al., 2021, Huang et al., 6 Jan 2026).

2. Lazy Training and Kernel Concentration Regime

In the overparameterized (“lazy”) regime, the parameter updates during training remain small: $\bm\theta\in\mathbb{R}^p$ 4 for $\bm\theta\in\mathbb{R}^p$ 5 steps and $\bm\theta\in\mathbb{R}^p$ 6 sufficiently large. This regime justifies the use of the tangent kernel: the kernel computed at initialization remains stable, ensuring that subsequent optimization can be replaced by static kernel methods (Shirai et al., 2021).

QNTK concentration is induced by selecting $\bm\theta\in\mathbb{R}^p$ 7, which guarantees convergence of the empirical kernel to its expectation within spectral norm. This is substantially more efficient than the classical NTK, where the required parameter count for similar guarantees is $\bm\theta\in\mathbb{R}^p$ 8 (Huang et al., 6 Jan 2026).

3. QNTK-UCB Algorithmic Framework

QNTK-UCB leverages the static QNTK as a kernel for kernelized bandit/RL inference. At each round $\bm\theta\in\mathbb{R}^p$ 9 (having observed rewards $x\in\mathcal{X}\subset\mathbb{R}^d$ 0 for $x\in\mathcal{X}\subset\mathbb{R}^d$ 1), it maintains:

$x\in\mathcal{X}\subset\mathbb{R}^d$ 2

The ridge-regression estimate is $x\in\mathcal{X}\subset\mathbb{R}^d$ 3. For each action $x\in\mathcal{X}\subset\mathbb{R}^d$ 4, the UCB is given by

$x\in\mathcal{X}\subset\mathbb{R}^d$ 5

where the exploration parameter $x\in\mathcal{X}\subset\mathbb{R}^d$ 6 is set via martingale concentration bounds: $x\in\mathcal{X}\subset\mathbb{R}^d$ 7 with $x\in\mathcal{X}\subset\mathbb{R}^d$ 8 and $x\in\mathcal{X}\subset\mathbb{R}^d$ 9 controlling the sub-Gaussian noise and RKHS norm of the reward, respectively. The action maximizing $f(x; \bm\theta) = \frac{1}{N(m)}\sum_{k=1}^m \left\langle 0^m\left| U^\dagger(\bm\theta, x)\, \mathcal{O}_k\, U(\bm\theta, x)\right|0^m \right\rangle,$ 0 is selected, reward is observed, and statistics are updated. The full procedure mirrors a kernelized linear UCB, but in the QNTK feature space (Huang et al., 6 Jan 2026).

4. Regret Analysis and Parameter Scaling

Under standard boundedness and realizability assumptions—the reward function $f(x; \bm\theta) = \frac{1}{N(m)}\sum_{k=1}^m \left\langle 0^m\left| U^\dagger(\bm\theta, x)\, \mathcal{O}_k\, U(\bm\theta, x)\right|0^m \right\rangle,$ 1 lies in the RKHS of the QNTK—regret is bounded by

$f(x; \bm\theta) = \frac{1}{N(m)}\sum_{k=1}^m \left\langle 0^m\left| U^\dagger(\bm\theta, x)\, \mathcal{O}_k\, U(\bm\theta, x)\right|0^m \right\rangle,$ 2

where the quantum effective dimension is

$f(x; \bm\theta) = \frac{1}{N(m)}\sum_{k=1}^m \left\langle 0^m\left| U^\dagger(\bm\theta, x)\, \mathcal{O}_k\, U(\bm\theta, x)\right|0^m \right\rangle,$ 3

with $f(x; \bm\theta) = \frac{1}{N(m)}\sum_{k=1}^m \left\langle 0^m\left| U^\dagger(\bm\theta, x)\, \mathcal{O}_k\, U(\bm\theta, x)\right|0^m \right\rangle,$ 4 the limiting QNTK Gram matrix. This effective dimension is controlled by the spectral decay of QNTK eigenvalues $f(x; \bm\theta) = \frac{1}{N(m)}\sum_{k=1}^m \left\langle 0^m\left| U^\dagger(\bm\theta, x)\, \mathcal{O}_k\, U(\bm\theta, x)\right|0^m \right\rangle,$ 5: $f(x; \bm\theta) = \frac{1}{N(m)}\sum_{k=1}^m \left\langle 0^m\left| U^\dagger(\bm\theta, x)\, \mathcal{O}_k\, U(\bm\theta, x)\right|0^m \right\rangle,$ 6 Sharper decay—common in QNTK compared to classical kernels—yields smaller $f(x; \bm\theta) = \frac{1}{N(m)}\sum_{k=1}^m \left\langle 0^m\left| U^\dagger(\bm\theta, x)\, \mathcal{O}_k\, U(\bm\theta, x)\right|0^m \right\rangle,$ 7 and thus lower regret (Huang et al., 6 Jan 2026).

In contrast, classical NeuralUCB requires parameter scaling $f(x; \bm\theta) = \frac{1}{N(m)}\sum_{k=1}^m \left\langle 0^m\left| U^\dagger(\bm\theta, x)\, \mathcal{O}_k\, U(\bm\theta, x)\right|0^m \right\rangle,$ 8 to maintain NTK concentration, resulting in significantly higher complexity (Huang et al., 6 Jan 2026). Thus, QNTK-UCB realizes a parameter-efficient regime.

5. Spectral Properties and Implicit Regularization

QNTK's eigenvalues typically exhibit a more rapid decay than classical NTKs or RBF kernels, reflecting a strong spectral bias. This phenomenon, related to the barren plateau effect (gradient concentration near zero for deep quantum circuits), results in lower effective dimension $f(x; \bm\theta) = \frac{1}{N(m)}\sum_{k=1}^m \left\langle 0^m\left| U^\dagger(\bm\theta, x)\, \mathcal{O}_k\, U(\bm\theta, x)\right|0^m \right\rangle,$ 9 and smaller information gain, reducing the exploration cost in bandit tasks (Shirai et al., 2021, Huang et al., 6 Jan 2026).

While a uniform eigenvalue shrinkage would impair the representational power, QNTK often concentrates its spectral mass on a low-rank subspace matched to quantum-native reward functions. For shallow entangling circuits, most variance is explained by a small number of leading eigenvalues: $\mathcal{O} = \sum_{k=1}^m \mathcal{O}_k$ 0 This sharp concentration acts as an implicit regularizer—reducing noise propagation and enabling efficient exploration without loss of alignment to the relevant signal subspace (Huang et al., 6 Jan 2026).

6. Practical Implementation and Evaluation

Key practical insights include:

Kernel evaluation cost: QNTK-UCB requires $\mathcal{O} = \sum_{k=1}^m \mathcal{O}_k$ 1 circuit evaluations per Gram matrix entry via parameter-shift for each $\mathcal{O} = \sum_{k=1}^m \mathcal{O}_k$ 2; Gram matrix assembly is $\mathcal{O} = \sum_{k=1}^m \mathcal{O}_k$ 3 for $\mathcal{O} = \sum_{k=1}^m \mathcal{O}_k$ 4 data points.
Parameter scaling: For feature dimension $\mathcal{O} = \sum_{k=1}^m \mathcal{O}_k$ 5 with two-qubit entanglers (layer count $\mathcal{O} = \sum_{k=1}^m \mathcal{O}_k$ 6, qubits $\mathcal{O} = \sum_{k=1}^m \mathcal{O}_k$ 7), QNTK scales efficiently up to the thresholds required for kernel concentration.
Noise robustness: Since parameter training is not performed on device, QNTK-UCB isolates inference from device-induced noise; any noisy gradient estimation only affects the GP posterior variance $\mathcal{O} = \sum_{k=1}^m \mathcal{O}_k$ 8, which is handled by Bayesian updates (Shirai et al., 2021).

Empirical assessments validate these properties:

On synthetic Gaussian-quantile classification bandits, QNTK-UCB achieves lower regret than NeuralUCB, NTK-UCB, and RBF-UCB at equal parameter counts, particularly in low-data and overparameterized regimes.
On quantum-native tasks, such as variational quantum eigensolver (VQE) recommender bandits, QNTK-UCB outperforms classical baselines, capturing Hilbert-space correlations inaccessible to classical kernels.
Empirically, QNTK’s effective dimension saturates or decreases with increasing $\mathcal{O} = \sum_{k=1}^m \mathcal{O}_k$ 9, while classical NTK dimensions diverge, confirming superior capacity control (Huang et al., 6 Jan 2026).

7. Quantum Advantage, Open Directions, and Limitations

QNTK-UCB realizes quantum advantage by matching quantum-native inductive biases with effective parameter counts $\mathcal{O}_k$ 0 and leveraging spectral bias to reduce required exploration. Freezing the QNN at initialization bypasses barren-plateau training pathologies, enabling scalable and stable inference with provable regret guarantees in online settings (Huang et al., 6 Jan 2026).

Open problems and future directions include:

Characterizing function classes and quantum-circuit tasks for which QNTK-UCB exceeds any classical kernel.
Exploring hybrid quantum–classical architectures that allow limited trainability for tasks with greater reward function complexity.
Investigating deeper and more non-local circuits to amplify quantum advantage, balanced by kernel concentration trade-offs.
Formalizing the precise relationship between circuit architecture, spectral bias, and achievable effective dimension.

QNTK-UCB establishes a blueprint for exploiting quantum inductive bias in online learning, substantially lowering sample and parameter complexity in low-data and quantum-native regimes (Shirai et al., 2021, Huang et al., 6 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (2)

Quantum tangent kernel (2021)

Quantum-Enhanced Neural Contextual Bandit Algorithms (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Quantum Neural Tangent Kernel-Upper Confidence Bound (QNTK-UCB).