Papers
Topics
Authors
Recent
2000 character limit reached

Bernoulli Micro-Kernel for Quantum PDE Sampling

Updated 23 November 2025
  • Bernoulli micro-kernel is a quantum computational primitive that performs explicit stencil node updates in finite-difference PDE solvers using shallow, single-qubit circuits.
  • It leverages constant-resource Monte Carlo sampling to estimate convex-sum stencil updates, ensuring unbiased estimators with error convergence of O(1/√M).
  • Empirical evaluations on simulators and NISQ devices demonstrate its scalability, lower bias, and improved accuracy compared to deeper, entangling alternatives.

The Bernoulli micro-kernel is a quantum computational primitive designed to perform explicit stencil node updates arising in finite-difference solvers for Partial Differential Equations (PDEs). In this context, it serves as a localized, constant-resource Monte Carlo subroutine—implementable via shallow, single-qubit quantum circuits—for accelerating the sampling of convex-sum stencil updates. Its resource cost in qubits and circuit depth does not scale with the problem size, rendering it suitable for orchestrated, massively parallel applications over computational grids. The Bernoulli micro-kernel is a realization of the broader QPU micro-kernel concept, wherein a quantum processor (QPU) acts as a sampling accelerator, invoked by a classical host that maintains the outer iteration structure (Markidis et al., 16 Nov 2025).

1. Stencil Computation Framework and the Micro-Kernel Paradigm

Explicit finite-difference PDE solvers, such as the Forward-Time Centered-Space (FTCS) method for the 1D Heat equation, update the value at each spatial node ii at time n+1n+1 according to a convex combination of neighbor values at time nn:

uin+1=wLui1n+wCuin+wRui+1n,wL+wC+wR=1,wb0.u_i^{n+1} = w_L\,u_{i-1}^n + w_C\,u_i^n + w_R\,u_{i+1}^n, \qquad w_L + w_C + w_R = 1, \quad w_b\ge0.

In the QPU micro-kernel framework, the classical host iterates over time steps nn and nodes ii, invoking the quantum micro-kernel only to obtain unbiased Monte Carlo estimates of the stencil update for each node. This approach offloads the local convex-sum operation to the quantum device, leaving the global time loop and grid iteration on the classical host (Markidis et al., 16 Nov 2025).

2. Bernoulli Micro-Kernel Circuit Structure

The Bernoulli micro-kernel operates via single-qubit circuits for each stencil branch b{L,C,R}b\in\{L,C,R\}:

  • The qubit is initialized in 0\ket{0}.
  • A single-qubit RyR_y rotation is applied with angle θ(ub)=2arcsinub\theta(u_b) = 2\arcsin\sqrt{u_b'}, where ubu_b' is the affine-normalized neighbor value mapped to [0,1][0,1].
  • The qubit is measured in the computational basis, yielding a Bernoulli sample with Pr(outcome =1)=ub\Pr(\text{outcome }=1)=u_b'.
  • This process is repeated MbM_b times per branch to obtain an empirical mean u^b\hat{u}_b.

No entanglement or multi-qubit operations are involved; each branch is executed independently (Markidis et al., 16 Nov 2025).

3. Data Encoding and Shot Allocation

Neighbor values ubu_b originally in [umin,umax][u_{\min}, u_{\max}] are linearly normalized to ubu_b' in [0,1][0,1]:

ub=ubuminumaxumin[0,1].u_b' = \frac{u_b - u_{\min}}{u_{\max}-u_{\min}} \in [0,1].

Given a per-node shot budget MM, shots are allocated proportionally to the stencil weights:

Mb=wbM,bMb=M.M_b = \lfloor w_b\, M \rceil, \quad \sum_b M_b = M.

Each branch executes its RyR_y circuit MbM_b times, enabling shot-based statistical estimation respecting the convex weights (Markidis et al., 16 Nov 2025).

4. Estimator Construction and Statistical Properties

Let Xb(s){0,1}X_b^{(s)}\in\{0,1\} be the outcome of the ss-th measurement for branch bb. The convex-sum estimator for uin+1u_i^{n+1} is

u^in+1=1Mb{L,C,R}s=1MbXb(s)=b{L,C,R}MbMu^b,\hat{u}_i^{n+1} = \frac{1}{M}\sum_{b\in\{L,C,R\}} \sum_{s=1}^{M_b} X_b^{(s)} = \sum_{b\in\{L,C,R\}} \frac{M_b}{M}\, \hat{u}_b,

where u^b=1MbsXb(s)\hat{u}_b = \frac{1}{M_b}\sum_{s} X_b^{(s)}.

The estimator is unbiased:

E[u^in+1]bwbub\mathbb{E}[\hat{u}_i^{n+1}] \approx \sum_b w_b\, u_b'

and its variance is

Var[u^in+1]=1Mbwbub(1ub)14M.\mathrm{Var}[\hat{u}_i^{n+1}] = \frac{1}{M}\sum_b w_b u_b'(1-u_b') \leq \frac{1}{4M}.

The standard error vanishes as O(1/M)\mathcal{O}(1/\sqrt{M}) (Markidis et al., 16 Nov 2025).

5. Resource Requirements and Scaling

The Bernoulli micro-kernel achieves resource independence from grid size:

  • Qubit count per branch: 1 qubit.
  • Circuit depth per shot: one RyR_y gate plus measurement.
  • No entanglement between qubits; no increase in circuit complexity with additional grid points.

This constancy makes micro-kernels amenable to large-scale grid parallelization, with classical orchestration handling all node and branch-level iteration (Markidis et al., 16 Nov 2025).

6. Error Behavior and Convergence

The standard error for each branch's mean estimator is

SE(u^b)=ub(1ub)Mb12Mb,\mathrm{SE}(\hat{u}_b) = \sqrt{\frac{u_b'(1-u_b')}{M_b}} \leq \frac{1}{2\sqrt{M_b}},

and propagates through the convex-sum to

SE(u^in+1)12M.\mathrm{SE}(\hat{u}_i^{n+1}) \leq \frac{1}{2\sqrt{M}}.

Empirical studies using noiseless simulators confirm the O(1/M)O(1/\sqrt{M}) convergence: doubling MM reduces the estimator noise by approximately 2\sqrt{2} (Markidis et al., 16 Nov 2025).

7. Empirical Evaluation on Simulators and Quantum Hardware

Benchmarks were conducted for both the Heat and viscous Burgers’ equations:

Hardware Circuit Depth Gates Errors (L,L2L_\infty, L_2) Per-Node Wall Time
Simulator 1 1 × RyR_y O(M1/2)O(M^{-1/2}) Not reported
IBM Brisbane 3 1 × RyR_y, 1 × X 0.0848, 0.0368 (raw) ≈ 4.7 s (M=4000)
0.0756, 0.0378 (mitigated)

On IBM Brisbane, the Bernoulli micro-kernel with M=4000M=4000 shots per node achieved L=0.0848L_\infty=0.0848, L2=0.0368L_2=0.0368 without readout mitigation and L=0.0756L_\infty=0.0756, L2=0.0378L_2=0.0378 after applying single-qubit readout calibration. Circuit depth after transpilation was 3, with no two-qubit gates, and per-node wall time was approximately 4.7 s. In contrast, the branching micro-kernel exhibited higher error and deeper, more resource-intensive circuits (Markidis et al., 16 Nov 2025).

The results demonstrate that on present-day NISQ devices, the shallow, single-qubit Bernoulli micro-kernel consistently yields lower bias and higher accuracy relative to deeper, entangling alternatives, which are more susceptible to device noise (Markidis et al., 16 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Bernoulli Micro-Kernel.