Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Quantum (Inspired) $D^2$-sampling with Applications (2405.13351v1)

Published 22 May 2024 in quant-ph and cs.DS

Abstract: $D2$-sampling is a fundamental component of sampling-based clustering algorithms such as $k$-means++. Given a dataset $V \subset \mathbb{R}d$ with $N$ points and a center set $C \subset \mathbb{R}d$, $D2$-sampling refers to picking a point from $V$ where the sampling probability of a point is proportional to its squared distance from the nearest center in $C$. Starting with empty $C$ and iteratively $D2$-sampling and updating $C$ in $k$ rounds is precisely $k$-means++ seeding that runs in $O(Nkd)$ time and gives $O(\log{k})$-approximation in expectation for the $k$-means problem. We give a quantum algorithm for (approximate) $D2$-sampling in the QRAM model that results in a quantum implementation of $k$-means++ that runs in time $\tilde{O}(\zeta2 k2)$. Here $\zeta$ is the aspect ratio (i.e., largest to smallest interpoint distance), and $\tilde{O}$ hides polylogarithmic factors in $N, d, k$. It can be shown through a robust approximation analysis of $k$-means++ that the quantum version preserves its $O(\log{k})$ approximation guarantee. Further, we show that our quantum algorithm for $D2$-sampling can be 'dequantized' using the sample-query access model of Tang (PhD Thesis, Ewin Tang, University of Washington, 2023). This results in a fast quantum-inspired classical implementation of $k$-means++, which we call QI-$k$-means++, with a running time $O(Nd) + \tilde{O}(\zeta2k2d)$, where the $O(Nd)$ term is for setting up the sample-query access data structure. Experimental investigations show promising results for QI-$k$-means++ on large datasets with bounded aspect ratio. Finally, we use our quantum $D2$-sampling with the known $ D2$-sampling-based classical approximation scheme (i.e., $(1+\varepsilon)$-approximation for any given $\varepsilon>0$) to obtain the first quantum approximation scheme for the $k$-means problem with polylogarithmic running time dependence on $N$.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Adaptive sampling for k-means clustering. In Proceedings of the 12th International Workshop and 13th International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX ’09 / RANDOM ’09, pages 15–28, Berlin, Heidelberg, 2009. Springer-Verlag. URL: https://doi.org/10.1007/978-3-642-03685-9_2, doi:10.1007/978-3-642-03685-9_2.
  2. Quantum speed-up for unsupervised learning. Machine Learning, 90:261–287, 2013.
  3. K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, page 1027–1035, USA, 2007. Society for Industrial and Applied Mathematics.
  4. The Hardness of Approximation of Euclidean k𝑘kitalic_k-Means. In Lars Arge and János Pach, editors, 31st International Symposium on Computational Geometry (SoCG 2015), volume 34 of Leibniz International Proceedings in Informatics (LIPIcs), pages 754–767, Dagstuhl, Germany, 2015. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. URL: http://drops.dagstuhl.de/opus/volltexte/2015/5117, doi:http://dx.doi.org/10.4230/LIPIcs.SOCG.2015.754.
  5. Fast and provably good seedings for k-means. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL: https://proceedings.neurips.cc/paper_files/paper/2016/file/d67d8ab4f4c10bf22aa353e27879133c-Paper.pdf.
  6. Approximate k-means++ in sublinear time. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1), Feb. 2016. URL: https://ojs.aaai.org/index.php/AAAI/article/view/10259, doi:10.1609/aaai.v30i1.10259.
  7. Noisy, Greedy and Not so Greedy k-Means++. In Fabrizio Grandoni, Grzegorz Herman, and Peter Sanders, editors, 28th Annual European Symposium on Algorithms (ESA 2020), volume 173 of Leibniz International Proceedings in Informatics (LIPIcs), pages 18:1–18:21, Dagstuhl, Germany, 2020. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. URL: https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2020.18, doi:10.4230/LIPIcs.ESA.2020.18.
  8. On Sampling Based Algorithms for k-Means. In Nitin Saxena and Sunil Simon, editors, 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020), volume 182 of Leibniz International Proceedings in Informatics (LIPIcs), pages 13:1–13:17, Dagstuhl, Germany, 2020. Schloss Dagstuhl–Leibniz-Zentrum für Informatik. URL: https://drops.dagstuhl.de/opus/volltexte/2020/13254, doi:10.4230/LIPIcs.FSTTCS.2020.13.
  9. Inapproximability of clustering in lp metrics. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 519–539, 2019. doi:10.1109/FOCS.2019.00040.
  10. Fast and accurate k-means++ via rejection sampling. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 16235–16245. Curran Associates, Inc., 2020. URL: https://proceedings.neurips.cc/paper_files/paper/2020/file/babcff88f8be8c4795bd6f0f8cccca61-Paper.pdf.
  11. Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
  12. Do you know what q-means?, 2023. arXiv:2308.09701.
  13. A quantum algorithm for finding the minimum, 1999. arXiv:quant-ph/9607014.
  14. A quantum approximate optimization algorithm, 2014. arXiv:1411.4028.
  15. R. A. Fisher. Iris. UCI Machine Learning Repository, 1988. DOI: https://doi.org/10.24432/C56C76.
  16. FPT Approximation for Constrained Metric k-Median/Means. In Yixin Cao and Marcin Pilipczuk, editors, 15th International Symposium on Parameterized and Exact Computation (IPEC 2020), volume 180 of Leibniz International Proceedings in Informatics (LIPIcs), pages 14:1–14:19, Dagstuhl, Germany, 2020. Schloss Dagstuhl–Leibniz-Zentrum für Informatik. URL: https://drops.dagstuhl.de/opus/volltexte/2020/13317, doi:10.4230/LIPIcs.IPEC.2020.14.
  17. Noisy k-Means++ Revisited. In Inge Li Gørtz, Martin Farach-Colton, Simon J. Puglisi, and Grzegorz Herman, editors, 31st Annual European Symposium on Algorithms (ESA 2023), volume 274 of Leibniz International Proceedings in Informatics (LIPIcs), pages 55:1–55:7, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. URL: https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.55, doi:10.4230/LIPIcs.ESA.2023.55.
  18. Quantum algorithm for linear systems of equations. Phys. Rev. Lett., 103:150502, Oct 2009. URL: https://link.aps.org/doi/10.1103/PhysRevLett.103.150502, doi:10.1103/PhysRevLett.103.150502.
  19. Applications of weighted Voronoi diagrams and randomization to variance-based k𝑘kitalic_k-clustering: (extended abstract). In Proceedings of the tenth annual symposium on Computational geometry, SCG ’94, pages 332–339, New York, NY, USA, 1994. ACM. URL: http://doi.acm.org/10.1145/177424.178042, doi:10.1145/177424.178042.
  20. q-means: A quantum algorithm for unsupervised machine learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL: https://proceedings.neurips.cc/paper_files/paper/2019/file/16026d60ff9b54410b3435b403afd226-Paper.pdf.
  21. Quantum Recommendation Systems. In Christos H. Papadimitriou, editor, 8th Innovations in Theoretical Computer Science Conference (ITCS 2017), volume 67 of Leibniz International Proceedings in Informatics (LIPIcs), pages 49:1–49:21, Dagstuhl, Germany, 2017. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. URL: http://drops.dagstuhl.de/opus/volltexte/2017/8154, doi:10.4230/LIPIcs.ITCS.2017.49.
  22. Des-q: a quantum algorithm to construct and efficiently retrain decision trees for regression and binary classification, 2023. arXiv:2309.09976.
  23. S. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129–137, 1982.
  24. Quantum algorithms for supervised and unsupervised machine learning, 2013. arXiv:1307.0411.
  25. Unsupervised machine learning on a hybrid quantum computer, 2017. arXiv:1712.05771.
  26. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  27. On the quantitative analysis of deep belief networks. In Proceedings of the 25th international conference on Machine learning, pages 872–879. ACM, 2008.
  28. E. Tang. Dequantizing algorithms to understand quantum advantage in machine learning. Nat Rev Phys, 4:692–693, 2022. URL: https://doi.org/10.1038/s42254-022-00511-w.
  29. Ewin Tang. A quantum-inspired classical algorithm for recommendation systems. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, page 217?228, New York, NY, USA, 2019. Association for Computing Machinery. URL: https://doi.org/10.1145/3313276.3316310, doi:10.1145/3313276.3316310.
  30. Ewin Tang. Quantum machine learning without any quantum. PhD thesis, University of Washington, 2023.
  31. Quantum algorithms for nearest-neighbor methods for supervised and unsupervised learning. Quantum Info. Comput., 15(3–4):316–356, mar 2015.

Summary

We haven't generated a summary for this paper yet.