q-means Algorithm: Quantum & Hybrid Clustering
- q-means algorithm is a family of quantum and quantum-inspired clustering methods that generalize classical k-means using principles from quantum computation and tensor networks.
- It leverages techniques such as Matrix Product States, quantum subroutines, and hybrid quantum-classical models to reshape the loss surface and avoid local minima.
- Empirical benchmarks show improved clustering accuracy and robustness, with potential for quantum speedup and efficient handling of high-dimensional data.
The q-means algorithm refers collectively to a family of quantum and quantum-inspired algorithms that generalize and accelerate Lloyd's k-means clustering via principles from quantum computation, tensor network theory, and hybrid quantum-classical models. The common aim is to enable more efficient or more robust partitioning of data into clusters by leveraging high-dimensional embeddings, quantum metrics, or quantum parallelism. There exist fully quantum, quantum-inspired, and hybrid variants, each exploiting different quantum effects or classical analogs.
1. Matrix-Product-State Quantum-Inspired q-means
A prominent quantum-inspired approach to k-means leverages Matrix Product States (MPS), a formalism from one-dimensional quantum many-body systems, to represent data and centroids in an exponentially large Hilbert space (Shi et al., 2020). Given a classical dataset , each data point is embedded as an MPS-encoded quantum state in : Each centroid is parametrized as an MPS of the same bond dimension . The clustering objective, generalizing within-cluster variance, becomes the sum of squared Hilbert-space distances
The assignment step allocates each data point to the nearest centroid (in Hilbert norm), and the update step optimizes centroid MPSs via a DMRG-style local-sweep, alternating over MPS tensors. The optimization reshapes the loss surface, often avoiding poor classical local minima and increasing prediction accuracy.
Empirical benchmarks show the MPS-based q-means achieves higher or equal test accuracy compared to classical k-means across several datasets (e.g., Wine, Yeast, E. coli) and demonstrates improved robustness to initialization (Shi et al., 2020). Computational cost scales as per assignment and per update, remaining competitive for moderate and small , , or when fewer iterations are needed due to better minima avoidance.
2. Quantum q-means Algorithms and Quantum Acceleration
The canonical quantum q-means algorithm generalizes Lloyd's method by implementing both assignment and centroid update steps as quantum subroutines operating on QRAM-resident data (Kerenidis et al., 2018, Doriguello et al., 2023). For a data matrix :
- Distance estimation: Prepares superpositions over data points and centroids, estimating squared distances via quantum amplitude estimation.
- Assignment: Uses quantum minimum-finding to assign each data point to its nearest centroid in superposition.
- Update: Forms quantum states encoding cluster memberships, allowing the centroid to be updated by quantum linear algebra primitives (e.g., HHL-based quantum matrix-vector multiplication), followed by quantum state tomography to recover classical updates.
The per-iteration complexity can be polylogarithmic in , specifically,
where , is the condition number, and is a data-dependent parameter. For well-clusterable data, the scaling improves to linear in and polynomial in .
Convergence guarantees mimic robust -k-means: assignment and centroid errors are bounded, and the algorithm converges with loss monotonicity (Kerenidis et al., 2018). Quantum acceleration is most pronounced when , and practical speedup depends on the feasibility of QRAM and overheads in quantum state preparation, amplitude estimation, and tomography.
Recent improvements (Doriguello et al., 2023) remove reliance on quantum linear algebra primitives and instead use cluster-sample-based amplitude estimation, allowing the quantum algorithm (and an analogous "dequantized" classical algorithm) to attain only polylogarithmic dependence on , with improved scaling in and .
3. Hybrid Quantum-Classical and Quantum-Inspired q-means Variants
a. Hybrid Algorithms with Variational Embedding
Hybrid schemes, such as the variational quantum feature-embedding q-means (Menon et al., 2021), combine trainable quantum feature maps (variational circuits) that maximize cluster separation in the quantum feature Hilbert space with a quantum k-means loop. The algorithm alternates between quantum kernel estimation via swap test and amplitude estimation, cluster assignment, centroid updates (as quantum characteristic states), and classical optimization of the feature map parameters by minimizing inter-cluster Hilbert-Schmidt overlap. This design is effective for nonlinearly separable datasets and claims theoretical exponential speedup with qRAM for sufficiently large .
b. Quantum-Inspired Distance and Initialization
Recent developments (Oswal et al., 23 Sep 2025) minimize quantum overhead by substituting expensive swap tests with kernel-overlap circuits: assign all computations to a single register, prepare , apply , and measure return-to-vacuum probability as the squared fidelity. Angle, amplitude, or hybrid encoding is used to flexibly trade-off expressivity, with empirical benchmarks indicating consistent improvements in clustering metrics such as Adjusted Rand Index (ARI). The centroid initialization uses quantum-inspired supeposition-based sampling for superior basin selection.
c. Fast Assignment via Sparse Operator Factorization
An orthogonal quantum-inspired concept is the acceleration of k-means by factorizing the centroid matrix into a product of sparse matrices ("QuicK-means"), reducing the assignment step from to . Convergence and accuracy remain comparable to classical k-means even with relatively high sparsity (Giffon et al., 2019).
4. Implementation and Practical Aspects
q-means and its variants admit a range of circuit and data-embedding architectures:
- MPS-based q-means: Requires efficient tensor contraction, DMRG-style updates, and normalization sweeps (Shi et al., 2020).
- Quantum subroutines: Rely on QRAM for data retrieval, amplitude estimation for granular probability extraction, and block-encoding for quantum arithmetic. Tomography is essential for classical output of centroids.
- Hybrid and kernel-overlap circuits: Use angle, amplitude, or hybrid encoding; minimal ancilla or CSWAP usage is achieved in kernel-based schemes (Oswal et al., 23 Sep 2025).
- NISQ/compressive schemes: Quantum compressive k-means (qc-kmeans) combines classical Fourier-feature sketching and per-group QUBO centroid selection via shallow QAOA, enabling clustering with constant peak qubit usage ( in empirical settings) for datasets up to points (Chumpitaz-Flores et al., 26 Oct 2025).
- Quantum security: Some variants employ quantum homomorphic encryption with Grover-style minimization, separating computation between a semi-trusted and trusted server, ensuring client data privacy and computational offloading (Gong et al., 2020).
5. Performance Benchmarks, Empirical Results, and Limitations
Experimental and simulation studies across q-means variants have reported:
| Dataset | Classical k-means (%) | q-means D=8 (%) | q-means D=15 (%) |
|---|---|---|---|
| Breast | 99.27 | 100 | 100 |
| Ionosphere | 60.56 | 60.56 | 63.38 |
| Wine | 100 | 100 | 100 |
| Yeast | 45.45 | 47.47 | 48.48 |
| E. coli | 85.29 | 89.71 | — |
Matrix-product-state q-means achieves robust avoidance of poor minima and higher centroid orthogonality (Shi et al., 2020). Kernel-overlap q-means yields increased ARI and silhouette scores across diverse UCI datasets, frequently outperforming classical k-means (Oswal et al., 23 Sep 2025).
Limitations include increased computational overhead for large bond dimension in MPS approaches, costly qRAM or state-preparation for fully quantum implementations, potential loss of speedup due to state encoding and tomography, and non-Euclidean induced kernels degrading clustering accuracy if data-distribution mismatched (Modi et al., 2023, Oswal et al., 23 Sep 2025). For noise-robustness, shallow-circuit variants (e.g., qc-kmeans) remain effective under IBM noise models (Chumpitaz-Flores et al., 26 Oct 2025).
6. Theoretical and Practical Outlook
The q-means paradigm demonstrates that quantum formalism (either true quantum or quantum-inspired classical algorithms) generically allows for the transformation or acceleration of k-means clustering in multiple directions:
- Loss surface reshaping (via MPS or feature Hilbert space lifting) mitigates local minima.
- Quantum parallelism (in both assignment and update) theoretically enables exponential or polynomial reductions in iteration complexity, conditional on QRAM and circuit depth.
- Kernel-induced distances enhance assignment robustness and can be tailored through encoding or feature maps.
- Hybrid and compressive architectures adapt the quantum pipeline to bounded-width NISQ hardware, employing sketching and QAOA.
- Security in remote computation can be realized via quantum homomorphic encryption with low client overhead.
Open research avenues include extending initialization strategies (e.g., k-means++), scalable DMRG updates, improved data encodings and feature maps, analytical bounds for hybrid algorithms' speed and accuracy, and deployment of true quantum-circuit implementations as scalable quantum hardware becomes available (Shi et al., 2020, Chumpitaz-Flores et al., 26 Oct 2025).