Bernoulli Micro-Kernel for Quantum PDE Sampling
- Bernoulli micro-kernel is a quantum computational primitive that performs explicit stencil node updates in finite-difference PDE solvers using shallow, single-qubit circuits.
- It leverages constant-resource Monte Carlo sampling to estimate convex-sum stencil updates, ensuring unbiased estimators with error convergence of O(1/√M).
- Empirical evaluations on simulators and NISQ devices demonstrate its scalability, lower bias, and improved accuracy compared to deeper, entangling alternatives.
The Bernoulli micro-kernel is a quantum computational primitive designed to perform explicit stencil node updates arising in finite-difference solvers for Partial Differential Equations (PDEs). In this context, it serves as a localized, constant-resource Monte Carlo subroutine—implementable via shallow, single-qubit quantum circuits—for accelerating the sampling of convex-sum stencil updates. Its resource cost in qubits and circuit depth does not scale with the problem size, rendering it suitable for orchestrated, massively parallel applications over computational grids. The Bernoulli micro-kernel is a realization of the broader QPU micro-kernel concept, wherein a quantum processor (QPU) acts as a sampling accelerator, invoked by a classical host that maintains the outer iteration structure (Markidis et al., 16 Nov 2025).
1. Stencil Computation Framework and the Micro-Kernel Paradigm
Explicit finite-difference PDE solvers, such as the Forward-Time Centered-Space (FTCS) method for the 1D Heat equation, update the value at each spatial node at time according to a convex combination of neighbor values at time :
In the QPU micro-kernel framework, the classical host iterates over time steps and nodes , invoking the quantum micro-kernel only to obtain unbiased Monte Carlo estimates of the stencil update for each node. This approach offloads the local convex-sum operation to the quantum device, leaving the global time loop and grid iteration on the classical host (Markidis et al., 16 Nov 2025).
2. Bernoulli Micro-Kernel Circuit Structure
The Bernoulli micro-kernel operates via single-qubit circuits for each stencil branch :
- The qubit is initialized in .
- A single-qubit rotation is applied with angle , where is the affine-normalized neighbor value mapped to .
- The qubit is measured in the computational basis, yielding a Bernoulli sample with .
- This process is repeated times per branch to obtain an empirical mean .
No entanglement or multi-qubit operations are involved; each branch is executed independently (Markidis et al., 16 Nov 2025).
3. Data Encoding and Shot Allocation
Neighbor values originally in are linearly normalized to in :
Given a per-node shot budget , shots are allocated proportionally to the stencil weights:
Each branch executes its circuit times, enabling shot-based statistical estimation respecting the convex weights (Markidis et al., 16 Nov 2025).
4. Estimator Construction and Statistical Properties
Let be the outcome of the -th measurement for branch . The convex-sum estimator for is
where .
The estimator is unbiased:
and its variance is
The standard error vanishes as (Markidis et al., 16 Nov 2025).
5. Resource Requirements and Scaling
The Bernoulli micro-kernel achieves resource independence from grid size:
- Qubit count per branch: 1 qubit.
- Circuit depth per shot: one gate plus measurement.
- No entanglement between qubits; no increase in circuit complexity with additional grid points.
This constancy makes micro-kernels amenable to large-scale grid parallelization, with classical orchestration handling all node and branch-level iteration (Markidis et al., 16 Nov 2025).
6. Error Behavior and Convergence
The standard error for each branch's mean estimator is
and propagates through the convex-sum to
Empirical studies using noiseless simulators confirm the convergence: doubling reduces the estimator noise by approximately (Markidis et al., 16 Nov 2025).
7. Empirical Evaluation on Simulators and Quantum Hardware
Benchmarks were conducted for both the Heat and viscous Burgers’ equations:
| Hardware | Circuit Depth | Gates | Errors () | Per-Node Wall Time |
|---|---|---|---|---|
| Simulator | 1 | 1 × | Not reported | |
| IBM Brisbane | 3 | 1 × , 1 × X | 0.0848, 0.0368 (raw) | ≈ 4.7 s (M=4000) |
| 0.0756, 0.0378 (mitigated) |
On IBM Brisbane, the Bernoulli micro-kernel with shots per node achieved , without readout mitigation and , after applying single-qubit readout calibration. Circuit depth after transpilation was 3, with no two-qubit gates, and per-node wall time was approximately 4.7 s. In contrast, the branching micro-kernel exhibited higher error and deeper, more resource-intensive circuits (Markidis et al., 16 Nov 2025).
The results demonstrate that on present-day NISQ devices, the shallow, single-qubit Bernoulli micro-kernel consistently yields lower bias and higher accuracy relative to deeper, entangling alternatives, which are more susceptible to device noise (Markidis et al., 16 Nov 2025).