2000 character limit reached
A Faster $k$-means++ Algorithm (2211.15118v2)
Published 28 Nov 2022 in cs.DS and cs.LG
Abstract: $k$-means++ is an important algorithm for choosing initial cluster centers for the $k$-means clustering algorithm. In this work, we present a new algorithm that can solve the $k$-means++ problem with nearly optimal running time. Given $n$ data points in $\mathbb{R}d$, the current state-of-the-art algorithm runs in $\widetilde{O}(k )$ iterations, and each iteration takes $\widetilde{O}(nd k)$ time. The overall running time is thus $\widetilde{O}(n d k2)$. We propose a new algorithm \textsc{FastKmeans++} that only takes in $\widetilde{O}(nd + nk2)$ time, in total.
- Adaptive sampling for k-means clustering. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 15–28. Springer, 2009.
- Bypass exponential time preprocessing: Fast neural network training via weight-data correlation preprocessing. arXiv preprint arXiv:2211.14227, 2022.
- Better guarantees for k-means and euclidean k-median by primal-dual algorithms. SIAM Journal on Computing, 49(4):FOCS17–97, 2019.
- k-means++: The advantages of careful seeding. Technical report, Stanford, 2006.
- Training (overparametrized) neural networks in near-linear time. In ITCS, 2021.
- Federated empirical risk minimization via second-order method. arXiv preprint arXiv:2305.17482, 2023.
- On variants of k-means clustering. arXiv preprint arXiv:1512.02985, 2015.
- An expert model for self-care problems classification using probabilistic neural network and feature selection approach. Applied Soft Computing, 82:105545, 2019.
- Vincent Cohen-Addad. A fast approximation scheme for low-dimensional k-means. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 430–440. SIAM, 2018.
- Local search yields approximation schemes for k-means and k-median in euclidean and minor-free metrics. SIAM Journal on Computing, 48(2):644–667, 2019.
- Fast and accurate k𝑘kitalic_k-means++ via rejection sampling. Advances in Neural Information Processing Systems, 33:16235–16245, 2020.
- A near-optimal algorithm for approximating the john ellipsoid. In Conference on Learning Theory, pages 849–873. PMLR, 2019.
- Ke Chen. On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923–947, 2009.
- Learning user perceived clusters with feature-level supervision. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
- Solving linear programs in the current matrix multiplication time. In STOC, 2019.
- Learning feature representations with k-means. In Neural networks: Tricks of the trade, pages 561–580. Springer, 2012.
- Low-rank approximation and regression in input sparsity time. In Journal of the ACM (JACM), A Preliminary version of this paper is appeared at STOC, 2013.
- UCI machine learning repository, 2017.
- K-means clustering via principal component analysis. In Proceedings of the twenty-first international conference on Machine learning, page 29, 2004.
- An improved sample complexity for rank-1 matrix sensing. arXiv preprint arXiv:2303.06895, 2023.
- Image segmentation using k-means clustering algorithm and subtractive clustering algorithm. Procedia Computer Science, 54:764–771, 2015.
- Hand movement recognition for brazilian sign language: a study using distance-based neural networks. In 2009 international joint conference on neural networks, pages 697–704. IEEE, 2009.
- Faster robust tensor power method for arbitrary order. arXiv preprint arXiv:2306.00406, 2023.
- A ptas for k-means clustering based on weak coresets. In Proceedings of the twenty-third annual symposium on Computational geometry, pages 11–18, 2007.
- Local search yields a ptas for k-means in doubling metrics. SIAM Journal on Computing, 48(2):452–480, 2019.
- A fast k-means implementation using coresets. International Journal of Computational Geometry & Applications, 18(06):605–625, 2008.
- Turning big data into tiny data: Constant-size coresets for k-means, pca, and projective clustering. SIAM Journal on Computing, 49(3):601–657, 2020.
- Vector quantization and signal compression, volume 159. Springer Science & Business Media, 2012.
- A faster small treewidth sdp solver. arXiv preprint arXiv:2211.06033, 2022.
- A fast optimization view: Reformulating single layer attention in llm based on tensor and svm trick, and solving it in matrix multiplication time. arXiv preprint arXiv:2309.07418, 2023.
- An iterative algorithm for rescaled hyperbolic functions regression. arXiv preprint arXiv:2305.00660, 2023.
- Low rank matrix completion via robust alternating minimization in nearly linear time. arXiv preprint arXiv:2302.11068, 2023.
- Smaller coresets for k-median and k-means clustering. In Proceedings of the twenty-first annual symposium on Computational geometry, pages 126–134, 2005.
- On coresets for k-means and k-median clustering. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pages 291–300, 2004.
- Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics, 26(189-206):1, 1984.
- An improved cutting plane method for convex optimization, convex-concave games and its applications. In STOC, 2020.
- Convex minimization with integer minima in O~(n4)~𝑂superscript𝑛4\widetilde{O}(n^{4})over~ start_ARG italic_O end_ARG ( italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) time. arXiv preprint arXiv:2304.03426, 2023.
- Faster dynamic matrix inverse for faster lps. In STOC, 2021.
- Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. Journal of the ACM (JACM), 48(2):274–296, 2001.
- A local search approximation algorithm for k-means clustering. Computational Geometry, 28(2-3):89–112, 2004.
- Linear-time approximation schemes for clustering problems in any dimensions. Journal of the ACM (JACM), 57(2):1–32, 2010.
- Stuart Lloyd. Least squares quantization in pcm. IEEE transactions on information theory, 28(2):129–137, 1982.
- A better k-means++ algorithm via local search. In International Conference on Machine Learning, pages 3662–3671. PMLR, 2019.
- Improved and simplified inapproximability for k-means. Information Processing Letters, 120:40–43, 2017.
- Solving empirical risk minimization in the current matrix multiplication time. In Conference on Learning Theory (COLT), pages 2140–2157. PMLR, 2019.
- Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In 2013 ieee 54th annual symposium on foundations of computer science, pages 117–126. IEEE, 2013.
- An online and unified algorithm for projection matrix vector multiplication with application to empirical risk minimization. In AISTATS, 2023.
- Oblivious sketching-based central path method for linear programming. In International Conference on Machine Learning, pages 9835–9847. PMLR, 2021.
- Faster algorithm for structured john ellipsoid computation. arXiv preprint arXiv:2211.14407, 2022.
- Efficient alternating minimization with applications to weighted low rank approximation. arXiv preprint arXiv:2306.04169, 2023.
- A nearly-optimal bound for fast regression with ℓ∞subscriptℓ\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT guarantee. In International Conference on Machine Learning, pages 32463–32482. PMLR, 2023.
- Solving attention kernel regression problem via pre-conditioner. arXiv preprint arXiv:2308.14304, 2023.
- Revisiting quantum algorithms for linear regressions: Quadratic speedups without data-dependent parameters. arXiv preprint arXiv:2311.14823, 2023.
- Training multi-layer over-parametrized neural network in subquadratic time. arXiv preprint arXiv:2112.07628, 2021.
- Time series clustering: A superior alternative for market basket analysis. In Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013), pages 241–248. Springer, Singapore, 2014.
- Dennis Wei. A constant-factor bi-criteria approximation guarantee for k-means++. Advances in Neural Information Processing Systems, 29, 2016.
- Scadi: A standard dataset for self-care problems classification of children with physical and motor disability. International Journal of Medical Informatics, 2018.
- Lichen Zhang. Speeding up optimizations via data structures: Faster search, sample and maintenance. Master’s thesis, Carnegie Mellon University, 2022.