Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
Gemini 2.5 Pro
GPT-5
GPT-4o
DeepSeek R1 via Azure
2000 character limit reached

Sum-of-norms clustering does not separate nearby balls (2104.13753v3)

Published 28 Apr 2021 in cs.LG, math.ST, and stat.TH

Abstract: Sum-of-norms clustering is a popular convexification of $K$-means clustering. We show that, if the dataset is made of a large number of independent random variables distributed according to the uniform measure on the union of two disjoint balls of unit radius, and if the balls are sufficiently close to one another, then sum-of-norms clustering will typically fail to recover the decomposition of the dataset into two clusters. As the dimension tends to infinity, this happens even when the distance between the centers of the two balls is taken to be as large as $2\sqrt{2}$. In order to show this, we introduce and analyze a continuous version of sum-of-norms clustering, where the dataset is replaced by a general measure. In particular, we state and prove a local-global characterization of the clustering that seems to be new even in the case of discrete datapoints.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn., 75(2):245–248, 2009.
  2. Luigi Ambrosio. Lecture notes on optimal transport problems. In Mathematical aspects of evolving interfaces (Funchal, 2000), volume 1812 of Lecture Notes in Mathematics, page 1–52. Springer, Berlin, 2003.
  3. Relax, no need to round: Integrality of clustering formulations. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, page 191–200, 2015.
  4. Splitting methods for convex clustering. J. Comput. Graph. Statist., 24(4):994–1013, 2015.
  5. Recovering trees with convex clustering. SIAM J. Math. Data Sci., 1(3):383–407, 2019.
  6. Fast tree inference with weighted fusion penalties. J. Comput. Graph. Statist., 26(1):205–216, 2017.
  7. Antonio De Rosa and Aida Khajavirad. The ratio-cut polytope and K-means clustering. SIAM J. Optim., 32(1):173–203, 2022.
  8. R. M. Dudley. The speed of mean Glivenko-Cantelli convergence. Ann. Math. Statist., 40:40–50, 1968.
  9. Steven R. Dunbar. The average distance between points in geometric figures. College Math. J., 28(3):187–197, 1997.
  10. Local versions of sum-of-norms clustering. SIAM J. Math. Data Sci., 4(4):1250–1271, 2022.
  11. Rick Durrett. Probability: theory and examples, volume 31 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, fourth edition, 2010.
  12. Convex analysis and variational problems. Studies in Mathematics and its Applications, Vol. 1. North-Holland Publishing Co., Amsterdam-Oxford; American Elsevier Publishing Co., Inc., New York, 1976. Translated from the French.
  13. Lawrence C. Evans. Partial Differential Equations, volume 19 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, second edition, 2010.
  14. Nicolás García Trillos and Dejan Slepčev. On the rate of convergence of empirical measures in ∞\infty∞-transportation distance. Canad. J. Math., 67(6):1358–1383, 2015.
  15. One Thousand Exercises in Probability. Oxford University Press, 2020.
  16. Clusterpath: an algorithm for clustering using convex fusion penalties. In Lise Getoor and Tobias Scheffer, editors, Proc. 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, page 745–752. Omnipress, 2011.
  17. Probably certifiably correct k𝑘kitalic_k-means clustering. Math. Prog., 165(2, Ser. A):605–642, 2017.
  18. Certifying clusters from sum-of-norms clustering. Preprint, 2020. arXiv:2006.11355.
  19. Recovery of a mixture of Gaussians by sum-of-norms clustering. J. Mach. Learn. Res., 21:Paper No. 225, 16, 2020.
  20. When do birds of a feather flock together? k𝑘kitalic_k-means, proximity, and conic programming. Math. Prog., 179(1-2, Ser. A):295–341, 2020.
  21. Clustering using sum-of-norms regularization: With application to particle filter output computation. In 2011 IEEE Statistical Signal Processing Workshop (SSP), page 201–204, 2011.
  22. The planar k𝑘kitalic_k-means problem is NP-hard. In WALCOM—Algorithms and computation, volume 5431 of Lecture Notes in Computer Science, page 274–285. Springer, Berlin, 2009.
  23. Recovery guarantees for exemplar-based clustering. Inform. and Comput., 245:165–180, 2015.
  24. On convex clustering solutions. Preprint, 2021. arXiv:2105.08348.
  25. Clustering by sum of norms: Stochastic incremental algorithm, convergence and cluster recovery. In Doina Precup and Yee Whye Teh, editors, Proc. 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proc. Mach. Learn. Res., page 2769–2777, 2017.
  26. Dmitry Panchenko. Lecture notes on probability theory. URL https://sites.google.com/site/panchenkomath/.
  27. Convex clustering shrinkage. In Workshop on Statistics and optimization of clustering Workshop (PASCAL), 2005. URL ftp://ftp.esat.kuleuven.ac.be/sista/kpelckma/ccs_pelckmans2005.pdf.
  28. Convex clustering via l1subscript𝑙1l_{1}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT fusion penalization. J. R. Stat. Soc. Ser. B. Stat. Methodol., 79(5):1527–1546, 2017.
  29. Luis A. Santaló. Integral geometry and geometric probability. Addison-Wesley Publishing Co., Reading, Mass.-London-Amsterdam, 1976.
  30. Convex clustering: model, theoretical guarantee and efficient algorithm. J. Mach. Learn. Res., 22:Paper No. 9, 32, 2021.
  31. Statistical properties of convex clustering. Electron. J. Stat., 9(2):2324–2347, 2015.
  32. Convex optimization procedure for clustering: Theoretical revisit. In Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, page 1619–1627, 2014.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets