Mixed-State Quantum Denoising Diffusion Probabilistic Model (2411.17608v2)
Abstract: Generative quantum machine learning has gained significant attention for its ability to produce quantum states with desired distributions. Among various quantum generative models, quantum denoising diffusion probabilistic models (QuDDPMs) [Phys. Rev. Lett. 132, 100602 (2024)] provide a promising approach with stepwise learning that resolves the training issues. However, the requirement of high-fidelity scrambling unitaries in QuDDPM poses a challenge in near-term implementation. We propose the \textit{mixed-state quantum denoising diffusion probabilistic model} (MSQuDDPM) to eliminate the need for scrambling unitaries. Our approach focuses on adapting the quantum noise channels to the model architecture, which integrates depolarizing noise channels in the forward diffusion process and parameterized quantum circuits with projective measurements in the backward denoising steps. We also introduce several techniques to improve MSQuDDPM, including a cosine-exponent schedule of noise interpolation, the use of single-qubit random ancilla, and superfidelity-based cost functions to enhance the convergence. We evaluate MSQuDDPM on quantum ensemble generation tasks, demonstrating its successful performance.
Summary
- The paper introduces the Mixed-State Quantum Denoising Diffusion Probabilistic Model (MSQuDDPM) for generating ensembles of mixed quantum states, designed for practical implementation on near-term hardware.
- MSQuDDPM utilizes depolarizing channels in the forward diffusion process and a VQA-based backward process trained sequentially with superfidelity cost functions to denoise states.
- Key features include a cosine-exponent noise schedule and superfidelity costs which improve performance, enabling the model to generate complex distributions like TFIM ground states.
The Mixed-State Quantum Denoising Diffusion Probabilistic Model (MSQuDDPM) represents a development in quantum generative modeling, extending the capabilities of prior Quantum Denoising Diffusion Probabilistic Models (QuDDPMs) [Phys. Rev. Lett. 132, 100602 (2024)]. Its primary focus is the generation of ensembles comprising both pure and mixed quantum states, while concurrently addressing practical implementation challenges associated with near-term quantum hardware by eliminating the need for high-fidelity scrambling unitaries found in the original QuDDPM framework (2411.17608).
Forward Diffusion Process: Noise Injection via Depolarizing Channels
The forward process in MSQuDDPM aims to systematically transform an initial ensemble of quantum states {ρ0(i)}i=1M, sampled from an unknown data distribution pdata(ρ), into a predefined noise distribution, typically the maximally mixed state I/d, over T discrete steps. Unlike the original QuDDPM which employed scrambling unitaries, MSQuDDPM utilizes iterative application of n-qubit depolarizing quantum channels Φt.
The state transformation at step t+1 is governed by:
ρt+1(i)=Φt+1(ρt(i))=(1−qt+1(i))ρt(i)+qt+1(i)dI
Here, ρt(i) is the i-th state in the ensemble at step t, qt+1(i) is the depolarizing probability (noise parameter) for that step, d=2n is the Hilbert space dimension for n qubits, and I/d represents the maximally mixed state. This mechanism progressively introduces isotropic noise, driving the state towards I/d as t→T.
A critical component is the noise schedule, which dictates the values of qt for t=1,…,T. Standard schedules like linear or cosine variations, common in classical diffusion models, were found to cause overly rapid convergence to the maximally mixed state in multi-qubit quantum systems. This premature loss of information hinders the learning capacity of the subsequent backward denoising process. To mitigate this, MSQuDDPM introduces the cosine-exponent schedule:
qt=(1−(αt/αt−1))k
where αt=f(t)/f(0) with f(t)=cos2(1+ϵt/T+ϵ⋅π/2). The exponent k (e.g., k=1 for cosine, k=2 for cosine square) and a small offset ϵ control the noise injection rate. The cosine square schedule (k=2) was demonstrated to better preserve state information (e.g., measured by purity) during the forward diffusion, especially in multi-qubit scenarios like learning phases of the Transverse-Field Ising Model (TFIM), leading to improved performance in the backward generation phase (2411.17608).
Backward Denoising Process: VQA-Based Generation
The backward process reverses the diffusion, starting from an ensemble of maximally mixed states {ρ~T(i)}i=1M (often initialized as {I/d}⊗M) and iteratively applying learned denoising operations to generate an ensemble {ρ~0(i)}i=1M intended to approximate the original data distribution pdata(ρ). This is implemented using a Variational Quantum Algorithm (VQA) structure.
At each backward step t (transitioning from ρ~t to ρ~t−1), a Parameterized Quantum Circuit (PQC), denoted U~t(θt), is applied. The PQC acts on the n-qubit state ρ~t(i) from the previous step, potentially augmented with na ancillary qubits initialized in a state ∣φ~t⟩. A typical PQC architecture follows a hardware-efficient ansatz, comprising L layers of parameterized single-qubit rotations (e.g., RX, RY) interleaved with fixed entangling gates (e.g., CNOT or CZ, often applied between adjacent qubits).
Following the unitary evolution UPQC=U~t(θt), projective measurements are performed on the na ancillary qubits, typically in the computational (Z) basis. The post-measurement state of the primary n qubits, averaged over measurement outcomes if necessary (though often the measurement outcomes are discarded for generating the next ensemble member), constitutes the denoised state ρ~t−1(i).
A key implementation detail is the sequential training strategy. Instead of optimizing all parameters {θt}t=1T simultaneously, the parameters for each step are trained sequentially in reverse order. Specifically, to learn the parameters θm+1 for the transition from t=m+1 to t=m, the cost function evaluates the discrepancy between the generated ensemble {ρ~m} and the corresponding target ensemble {ρm} obtained from the forward diffusion process. Once θm+1 is optimized, these parameters are fixed, and the training proceeds to optimize θm. This stepwise approach significantly reduces the number of active parameters at each training stage, mitigating issues like barren plateaus and facilitating convergence.
To potentially enhance the diversity of generated states, especially when starting all backward trajectories from identical maximally mixed states, the use of a single-qubit random ancilla is proposed. Instead of initializing all ancillae to ∣0⟩⊗na, one ancilla is prepared in a Haar random pure state ∣ϕHaar⟩, while the rest remain ∣0⟩. This introduces stochasticity early in the backward process (near t=T), potentially improving the model's ability to capture complex distributions. Numerical results suggest this can yield comparable or superior performance, sometimes with reduced resource requirements compared to using only ∣0⟩ ancillae (2411.17608).
Cost Functions for Mixed-State Ensemble Comparison
Training the PQCs requires quantifying the similarity between the generated mixed-state ensemble {ρ~m} and the target forward-process ensemble {ρm}. Standard fidelity is often intractable for mixed states due to the requirement of full Quantum State Tomography (QST), which scales exponentially with the number of qubits.
MSQuDDPM circumvents this by employing cost functions based on superfidelity, defined as:
G(ρ,σ)=Tr(ρσ)+[1−Tr(ρ2)][1−Tr(σ2)]
Superfidelity serves as a computationally accessible upper bound to the standard fidelity (F(ρ,σ)=(Trρσρ)2) and relies on estimating only the purities Tr(ρ2), Tr(σ2) and the overlap Tr(ρσ). These quantities can often be estimated more efficiently than performing full QST, for example, using randomized measurement techniques or the SWAP test for overlap.
For ensemble comparison, the concept is extended to mean superfidelity:
G({ρ1},{ρ2})=Eρ∼{ρ1},σ∼{ρ2}[G(ρ,σ)]
Two specific loss functions leveraging mean superfidelity are utilized for optimizing the PQC parameters θt:
- Maximum Mean Discrepancy (MMD):
DMMD({ρ~m},{ρm})=G({ρ~m},{ρ~m})+G({ρm},{ρm})−2G({ρ~m},{ρm})
Minimizing DMMD drives the distributions of the generated and target ensembles towards each other in the feature space implicitly defined by the superfidelity kernel.
- Wasserstein Distance: This metric is based on optimal transport theory, where the cost of matching a state ρi from the first ensemble to a state σj from the second is defined as Ci,j=1−G(ρi,σj). The Wasserstein distance minimizes the total transport cost over all possible pairings (or transport plans). It is suggested that this distance might be more robust than MMD for distinguishing complex distributions.
Implementation Considerations and Numerical Demonstrations
A significant practical advantage of MSQuDDPM is the replacement of high-fidelity, potentially complex multi-qubit scrambling unitaries in the forward process with sequences of n-qubit depolarizing channels. While implementing perfect depolarizing channels also presents challenges, they are generally considered less demanding in terms of coherent control fidelity compared to arbitrary unitaries, making the model potentially more suitable for near-term, noisy intermediate-scale quantum (NISQ) devices.
Resource scaling is also considered. The paper argues that increasing the number of diffusion steps T is generally more beneficial for model performance and scalability than increasing the number of ancillary qubits na used in each backward PQC step (2411.17608). While larger T increases the sequential depth of the computation, larger na increases the simultaneous qubit requirement and PQC complexity at each step, potentially exacerbating noise and coherence time limitations.
The effectiveness of MSQuDDPM was numerically demonstrated on several quantum ensemble generation tasks (2411.17608):
- Generating states clustered around specific points on the Bloch sphere.
- Reproducing circular distributions of states on the Bloch sphere.
- Learning the distribution of ground states within the paramagnetic phase of the 1D Transverse-Field Ising Model (TFIM).
These simulations validated the model's ability to handle mixed states and highlighted the positive impact of the proposed cosine-exponent schedule and superfidelity-based cost functions, particularly for multi-qubit systems.
Conclusion
The Mixed-State Quantum Denoising Diffusion Probabilistic Model (MSQuDDPM) offers a pathway for generative quantum machine learning capable of handling mixed-state ensembles. By replacing unitary scrambling with depolarizing channels in the forward process and employing a VQA-based backward process trained sequentially with superfidelity-based cost functions, it enhances practical feasibility for near-term implementation. Techniques like the cosine-exponent noise schedule and the option of single-qubit random ancilla initialization further refine its performance, particularly for multi-qubit applications demonstrated on tasks including Bloch sphere distributions and TFIM ground state generation (2411.17608).
Related Papers
- Quantum Denoising Diffusion Models (2024)
- Quantum Generative Diffusion Model: A Fully Quantum-Mechanical Model for Generating Quantum State Ensemble (2024)
- Quantum-Noise-Driven Generative Diffusion Models (2023)
- Generative quantum machine learning via denoising diffusion probabilistic models (2023)
- Quantum circuit synthesis with diffusion models (2023)