Mixed-State Quantum Denoising Diffusion Probabilistic Model (2411.17608v2)

Published 26 Nov 2024 in quant-ph, cs.AI, and cs.LG

Abstract: Generative quantum machine learning has gained significant attention for its ability to produce quantum states with desired distributions. Among various quantum generative models, quantum denoising diffusion probabilistic models (QuDDPMs) [Phys. Rev. Lett. 132, 100602 (2024)] provide a promising approach with stepwise learning that resolves the training issues. However, the requirement of high-fidelity scrambling unitaries in QuDDPM poses a challenge in near-term implementation. We propose the \textit{mixed-state quantum denoising diffusion probabilistic model} (MSQuDDPM) to eliminate the need for scrambling unitaries. Our approach focuses on adapting the quantum noise channels to the model architecture, which integrates depolarizing noise channels in the forward diffusion process and parameterized quantum circuits with projective measurements in the backward denoising steps. We also introduce several techniques to improve MSQuDDPM, including a cosine-exponent schedule of noise interpolation, the use of single-qubit random ancilla, and superfidelity-based cost functions to enhance the convergence. We evaluate MSQuDDPM on quantum ensemble generation tasks, demonstrating its successful performance.

Summary

The paper introduces the Mixed-State Quantum Denoising Diffusion Probabilistic Model (MSQuDDPM) for generating ensembles of mixed quantum states, designed for practical implementation on near-term hardware.
MSQuDDPM utilizes depolarizing channels in the forward diffusion process and a VQA-based backward process trained sequentially with superfidelity cost functions to denoise states.
Key features include a cosine-exponent noise schedule and superfidelity costs which improve performance, enabling the model to generate complex distributions like TFIM ground states.

The Mixed-State Quantum Denoising Diffusion Probabilistic Model (MSQuDDPM) represents a development in quantum generative modeling, extending the capabilities of prior Quantum Denoising Diffusion Probabilistic Models (QuDDPMs) [Phys. Rev. Lett. 132, 100602 (2024)]. Its primary focus is the generation of ensembles comprising both pure and mixed quantum states, while concurrently addressing practical implementation challenges associated with near-term quantum hardware by eliminating the need for high-fidelity scrambling unitaries found in the original QuDDPM framework (2411.17608).

Forward Diffusion Process: Noise Injection via Depolarizing Channels

The forward process in MSQuDDPM aims to systematically transform an initial ensemble of quantum states $\{\rho_0^{(i)}\}_{i=1}^M$ , sampled from an unknown data distribution $p_{\rm data}(\rho)$ , into a predefined noise distribution, typically the maximally mixed state $I/d$ , over $T$ discrete steps. Unlike the original QuDDPM which employed scrambling unitaries, MSQuDDPM utilizes iterative application of $n$ -qubit depolarizing quantum channels $\Phi_t$ .

The state transformation at step $t+1$ is governed by:

$\rho_{t+1}^{(i)} = \Phi_{t+1} (\rho_t^{(i)}) = (1-q_{t+1}^{(i)}) \rho_t^{(i)} + q_{t+1}^{(i)} \frac{I}{d}$

Here, $\rho_t^{(i)}$ is the $i$ -th state in the ensemble at step $t$ , $q_{t+1}^{(i)}$ is the depolarizing probability (noise parameter) for that step, $d=2^n$ is the Hilbert space dimension for $n$ qubits, and $I/d$ represents the maximally mixed state. This mechanism progressively introduces isotropic noise, driving the state towards $I/d$ as $t \to T$ .

A critical component is the noise schedule, which dictates the values of $q_t$ for $t=1, \dots, T$ . Standard schedules like linear or cosine variations, common in classical diffusion models, were found to cause overly rapid convergence to the maximally mixed state in multi-qubit quantum systems. This premature loss of information hinders the learning capacity of the subsequent backward denoising process. To mitigate this, MSQuDDPM introduces the cosine-exponent schedule:

$q_t = (1 - (\overline{\alpha}_t/\overline{\alpha}_{t-1}))^k$

where $\overline{\alpha}_t = f(t)/f(0)$ with $f(t) = \cos^2{(\frac{t/T + \epsilon}{1 + \epsilon}\cdot \pi/2)}$ . The exponent $k$ (e.g., $k=1$ for cosine, $k=2$ for cosine square) and a small offset $\epsilon$ control the noise injection rate. The cosine square schedule ( $k=2$ ) was demonstrated to better preserve state information (e.g., measured by purity) during the forward diffusion, especially in multi-qubit scenarios like learning phases of the Transverse-Field Ising Model (TFIM), leading to improved performance in the backward generation phase (2411.17608).

Backward Denoising Process: VQA-Based Generation

The backward process reverses the diffusion, starting from an ensemble of maximally mixed states $\{\tilde{\rho}_T^{(i)}\}_{i=1}^M$ (often initialized as $\{I/d\}^{\otimes M}$ ) and iteratively applying learned denoising operations to generate an ensemble $\{\tilde{\rho}_0^{(i)}\}_{i=1}^M$ intended to approximate the original data distribution $p_{\rm data}(\rho)$ . This is implemented using a Variational Quantum Algorithm (VQA) structure.

At each backward step $t$ (transitioning from $\tilde{\rho}_t$ to $\tilde{\rho}_{t-1}$ ), a Parameterized Quantum Circuit (PQC), denoted $\tilde{U}_t(\bm\theta_t)$ , is applied. The PQC acts on the $n$ -qubit state $\tilde{\rho}_t^{(i)}$ from the previous step, potentially augmented with $n_a$ ancillary qubits initialized in a state $|\tilde{\varphi}_t\rangle$ . A typical PQC architecture follows a hardware-efficient ansatz, comprising $L$ layers of parameterized single-qubit rotations (e.g., RX, RY) interleaved with fixed entangling gates (e.g., CNOT or CZ, often applied between adjacent qubits).

Following the unitary evolution $U_{\rm PQC} = \tilde{U}_t(\bm\theta_t)$ , projective measurements are performed on the $n_a$ ancillary qubits, typically in the computational (Z) basis. The post-measurement state of the primary $n$ qubits, averaged over measurement outcomes if necessary (though often the measurement outcomes are discarded for generating the next ensemble member), constitutes the denoised state $\tilde{\rho}_{t-1}^{(i)}$ .

A key implementation detail is the sequential training strategy. Instead of optimizing all parameters $\{\bm\theta_t\}_{t=1}^T$ simultaneously, the parameters for each step are trained sequentially in reverse order. Specifically, to learn the parameters $\bm\theta_{m+1}$ for the transition from $t=m+1$ to $t=m$ , the cost function evaluates the discrepancy between the generated ensemble $\{\tilde{\rho}_m\}$ and the corresponding target ensemble $\{\rho_m\}$ obtained from the forward diffusion process. Once $\bm\theta_{m+1}$ is optimized, these parameters are fixed, and the training proceeds to optimize $\bm\theta_m$ . This stepwise approach significantly reduces the number of active parameters at each training stage, mitigating issues like barren plateaus and facilitating convergence.

To potentially enhance the diversity of generated states, especially when starting all backward trajectories from identical maximally mixed states, the use of a single-qubit random ancilla is proposed. Instead of initializing all ancillae to $|0\rangle^{\otimes n_a}$ , one ancilla is prepared in a Haar random pure state $|\phi_{\rm Haar} \rangle$ , while the rest remain $|0\rangle$ . This introduces stochasticity early in the backward process (near $t=T$ ), potentially improving the model's ability to capture complex distributions. Numerical results suggest this can yield comparable or superior performance, sometimes with reduced resource requirements compared to using only $|0\rangle$ ancillae (2411.17608).

Cost Functions for Mixed-State Ensemble Comparison

Training the PQCs requires quantifying the similarity between the generated mixed-state ensemble $\{\tilde{\rho}_m\}$ and the target forward-process ensemble $\{\rho_m\}$ . Standard fidelity is often intractable for mixed states due to the requirement of full Quantum State Tomography (QST), which scales exponentially with the number of qubits.

MSQuDDPM circumvents this by employing cost functions based on superfidelity, defined as:

$G(\rho, \sigma) = \text{Tr}(\rho\sigma) + \sqrt{[1 - \text{Tr}(\rho^2)][1-\text{Tr}(\sigma^2)]}$

Superfidelity serves as a computationally accessible upper bound to the standard fidelity ( $F(\rho, \sigma) = (\text{Tr}\sqrt{\sqrt{\rho}\sigma\sqrt{\rho}})^2$ ) and relies on estimating only the purities $\text{Tr}(\rho^2)$ , $\text{Tr}(\sigma^2)$ and the overlap $\text{Tr}(\rho\sigma)$ . These quantities can often be estimated more efficiently than performing full QST, for example, using randomized measurement techniques or the SWAP test for overlap.

For ensemble comparison, the concept is extended to mean superfidelity:

$\overline{G}(\{\rho_1\}, \{\rho_2\}) = \mathbb{E}_{\rho \sim \{\rho_1\}, \sigma \sim \{\rho_2\}}[G(\rho, \sigma)]$

Two specific loss functions leveraging mean superfidelity are utilized for optimizing the PQC parameters $\bm\theta_t$ :

Maximum Mean Discrepancy (MMD):

$D_{\rm MMD}(\{\tilde{\rho}_m\}, \{\rho_m\}) = \overline{G}(\{\tilde{\rho}_m\}, \{\tilde{\rho}_m\}) + \overline{G}(\{\rho_m\}, \{\rho_m\}) - 2\overline{G}(\{\tilde{\rho}_m\}, \{\rho_m\})$

Minimizing $D_{\rm MMD}$ drives the distributions of the generated and target ensembles towards each other in the feature space implicitly defined by the superfidelity kernel.
Wasserstein Distance: This metric is based on optimal transport theory, where the cost of matching a state $\rho_i$ from the first ensemble to a state $\sigma_j$ from the second is defined as $C_{i,j} = 1 - G(\rho_i, \sigma_j)$ . The Wasserstein distance minimizes the total transport cost over all possible pairings (or transport plans). It is suggested that this distance might be more robust than MMD for distinguishing complex distributions.

Implementation Considerations and Numerical Demonstrations

A significant practical advantage of MSQuDDPM is the replacement of high-fidelity, potentially complex multi-qubit scrambling unitaries in the forward process with sequences of $n$ -qubit depolarizing channels. While implementing perfect depolarizing channels also presents challenges, they are generally considered less demanding in terms of coherent control fidelity compared to arbitrary unitaries, making the model potentially more suitable for near-term, noisy intermediate-scale quantum (NISQ) devices.

Resource scaling is also considered. The paper argues that increasing the number of diffusion steps $T$ is generally more beneficial for model performance and scalability than increasing the number of ancillary qubits $n_a$ used in each backward PQC step (2411.17608). While larger $T$ increases the sequential depth of the computation, larger $n_a$ increases the simultaneous qubit requirement and PQC complexity at each step, potentially exacerbating noise and coherence time limitations.

The effectiveness of MSQuDDPM was numerically demonstrated on several quantum ensemble generation tasks (2411.17608):

Generating states clustered around specific points on the Bloch sphere.
Reproducing circular distributions of states on the Bloch sphere.
Learning the distribution of ground states within the paramagnetic phase of the 1D Transverse-Field Ising Model (TFIM).

These simulations validated the model's ability to handle mixed states and highlighted the positive impact of the proposed cosine-exponent schedule and superfidelity-based cost functions, particularly for multi-qubit systems.

Conclusion

The Mixed-State Quantum Denoising Diffusion Probabilistic Model (MSQuDDPM) offers a pathway for generative quantum machine learning capable of handling mixed-state ensembles. By replacing unitary scrambling with depolarizing channels in the forward process and employing a VQA-based backward process trained sequentially with superfidelity-based cost functions, it enhances practical feasibility for near-term implementation. Techniques like the cosine-exponent noise schedule and the option of single-qubit random ancilla initialization further refine its performance, particularly for multi-qubit applications demonstrated on tasks including Bloch sphere distributions and TFIM ground state generation (2411.17608).

PDF Markdown

Related Papers

Tweets

https://twitter.com/QuantumPapers/status/1861792406661734899