Langevin Sampler: Scalable Quantum Tomography
- The paper introduces a Langevin sampler that leverages Burer–Monteiro factorization to reduce dimensionality and enforce the Hermitian PSD structure in quantum tomography.
- It employs a spectral Student–t prior to promote low-rank solutions and directly incorporate physical constraints in the parameterization.
- The unadjusted Langevin algorithm achieves computational efficiency with PAC–Bayesian risk bounds matching state-of-the-art rates and improved mixing over traditional MCMC methods.
A Langevin sampler for quantum tomography is a Bayesian computational approach that leverages the Burer–Monteiro factorization to efficiently estimate quantum states from measurement data, with explicit low-rank structure and scalability guarantees. The method operates by parameterizing a Hermitian positive semidefinite (PSD) density matrix via a product of a complex matrix and its conjugate transpose, imposing physical constraints directly in the parameter space. This enables the construction of a posterior distribution restricted to matrices of known or bounded rank and, through the use of a spectral Student– prior, promotes solutions of even lower rank when the true rank is unknown. The posterior is explored via an unadjusted Langevin algorithm (ULA), with rigorous PAC–Bayesian risk bounds that match state-of-the-art rates, and the algorithm achieves substantial computational savings compared to conventional Markov chain Monte Carlo (MCMC) techniques when the target density is low rank (Adel et al., 13 Jan 2026).
1. Parameterization via Burer–Monteiro Factorization
Quantum tomographic inference seeks a density matrix $\rho \in \C^{d \times d}$ satisfying , , and $\tr(\rho) = 1$. If is known (or assumed) to have rank , it is parameterized as with $Z \in \C^{d \times r}$. The unit-trace condition translates to , so resides on the complex hypersphere.
Measurement data is modeled by observables, each with possible outcomes. The empirical frequencies for observable and outcome are related to the Born prediction $p_{a,s}(Z) = \tr(\mathcal P_s^a Z Z^\dagger)$, where denote the POVM elements. A pseudo-likelihood corresponds to a sum-of-squares loss:
$L(Z) = \sum_{a=1}^A \sum_{s=1}^S [\hat p_{a,s} - \tr(\mathcal P_s^a Z Z^\dagger)]^2$
Bayesian inference proceeds with the posterior , leading to potential .
This factorization reduces the ambient parameter space from to dimensions and automatically maintains Hermitian PSD structure.
2. Low-Rank–Promoting Spectral Prior
When the rank of is unknown but an upper bound is available, a spectral Student– prior is used to promote low-rank solutions:
The prior decomposes as a product over the singular values of : , heavily penalizing small and thus favoring low-effective-rank . The gradient of is available in closed form, facilitating efficient implementation:
This prior is a complex generalization of that studied by Dalalyan (2020) for promoting low-rank matrix estimation.
3. Langevin Sampler: Stochastic Dynamics and Discretization
The posterior on is sampled via complex Langevin dynamics governed by the SDE:
where is Brownian motion in $\C^{d \times r}$.
Discretization through the unadjusted Langevin algorithm (ULA) with step size yields
To maintain the trace constraint , two strategies are used: (1) projected Langevin, normalizing after every step, or (2) unconstrained iteration with trace normalization applied only to the final estimator. Empirically, per-step drift in is negligible, so the latter is often preferable for computational simplicity.
4. Implementation Workflow
The following high-level pseudocode summarizes the Langevin sampling scheme for quantum tomography:
1 2 3 4 5 6 7 8 9 |
Input: r, η, λ, θ, N, B, initial Z₀ ∈ ℂ^{d×r}, ‖Z₀‖_F=1 for k = 0 to N−1: G = ∇_Z [ λ L(Z_k) − log p(Z_k) ] Ξ ∼ 𝒩(0, I_{dr}) # i.i.d. complex Gaussian noise Z_{k+1} = Z_k − η G + √(2η) Ξ # Optionally: Z_{k+1} ← Z_{k+1} / ‖Z_{k+1}‖_F end for ρ̄ = (1/(N−B)) ∑_{k=B}^{N−1} Z_k Z_k^† return ρ̂ = ρ̄ / tr(ρ̄) |
Key hyperparameter choices:
- λ controls the data-prior trade-off; for complete measurements, λ ≈ m/2 or 3m/8 is recommended.
- θ tunes the rank penalty; small θ strongly penalizes rank, large θ recovers a nearly uniform prior.
- η is chosen empirically for stability, typically –.
- N (number of iterations) and B (burn-in) are set to ensure convergence and posterior mixing.
5. PAC–Bayesian Risk Guarantees
In the full Pauli measurement setting with samples and true rank- density with $Y^0 \in \C^{d \times r}$, a PAC–Bayesian Frobenius risk bound holds. For any and comparison factor of rank such that
it holds with probability at least that \begin{align*} |\hat\rho-\rho0|_F2 &\leq \frac{3}{N_{\text{tot}}\bigl(3{3n/4}2{(n+6)/4}(r+\sqrt{r}\,|\bar Y|F)+2r/m+1\bigr)} \ &\qquad + \frac{8\cdot3n}{2n N{\text{tot}}} \frac{\log(2/\epsilon)+2p(2{n+1}+r+2) \log(1+|\bar Y|_2/\theta)}{} \end{align*} The analysis leverages exponential moment inequalities for the empirical squared error, KL-divergence bounds for shifted priors, and spectral properties of measurement operators. The leading term (up to log factors) matches the minimax rate $\frac{3^n\,\rank(\rho^0)}{N_{\text{tot}}}$ known from the literature (Mai & Alquier 2017, Mai 2021).
6. Computational Complexity and Empirical Performance
Each Langevin iteration involves:
- Data drift term: costs operations with measurements (for complete Pauli, ).
- Prior gradient: solving is reduced via Sherman–Morrison–Woodbury to for .
- Noise sampling: .
Total per-step cost: , dominated by for low-rank setting and moderate .
Overall, runtime scales with the number of iterations times this per-step cost. Empirical benchmarks indicate:
- Scalability: With small , each Langevin update is vastly faster than in full-rank MCMC schemes.
- Mixing: The chain mixes in fewer than steps, substantially fewer than required by Metropolis–Hastings methods for comparable accuracy.
- Estimation accuracy: The final estimator achieves Frobenius norm errors competitive with, or superior to, existing Bayesian quantum tomography algorithms.
7. Extensions and Practice Considerations
Potential refinements include step-size (η) annealing or adaptive temperature control for improved mixing. Metropolis-adjusted Langevin (MALA) or Riemannian variants may strengthen theoretical convergence guarantees. The method applies directly to process tomography (Choi matrix estimation) and can address incomplete measurement regimes. The reduction from to dimensional parameter space, inherent PSD constraint enforcement, and low-rank–favoring prior make the Langevin sampler a scalable and theoretically sound approach for large-scale quantum tomography with explicit risk guarantees (Adel et al., 13 Jan 2026).