Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Accelerated sparse Kernel Spectral Clustering for large scale data clustering problems (2310.13381v1)

Published 20 Oct 2023 in cs.LG

Abstract: An improved version of the sparse multiway kernel spectral clustering (KSC) is presented in this brief. The original algorithm is derived from weighted kernel principal component (KPCA) analysis formulated within the primal-dual least-squares support vector machine (LS-SVM) framework. Sparsity is achieved then by the combination of the incomplete Cholesky decomposition (ICD) based low rank approximation of the kernel matrix with the so called reduced set method. The original ICD based sparse KSC algorithm was reported to be computationally far too demanding, especially when applied on large scale data clustering problems that actually it was designed for, which has prevented to gain more than simply theoretical relevance so far. This is altered by the modifications reported in this brief that drastically improve the computational characteristics. Solving the alternative, symmetrized version of the computationally most demanding core eigenvalue problem eliminates the necessity of forming and SVD of large matrices during the model construction. This results in solving clustering problems now within seconds that were reported to require hours without altering the results. Furthermore, sparsity is also improved significantly, leading to more compact model representation, increasing further not only the computational efficiency but also the descriptive power. These transform the original, only theoretically relevant ICD based sparse KSC algorithm applicable for large scale practical clustering problems. Theoretical results and improvements are demonstrated by computational experiments on carefully selected synthetic data as well as on real life problems such as image segmentation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Improved electricity load forecasting via kernel spectral clustering of smart meters. In 2013 IEEE 13th International Conference on Data Mining, pages 943–948. IEEE.
  2. A weighted kernel PCA formulation with out-of-sample extensions for spectral clustering methods. In Proc. IJCNN’06 International Joint Conference on Neural Networks, pages 138–144.
  3. Sparse kernel models for spectral clustering using the incomplete Cholesky decomposition. In Proc. IJCNN’08 International Joint Conference on Neural Networks, pages 3556–3563.
  4. Multiway spectral clustering with out-of-sample extensions through weighted kernel PCA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(2):335–347.
  5. Sparse kernel spectral clustering models for large-scale data analysis. Neurocomputing, 74(9):1382–1390.
  6. LAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition.
  7. Contour detection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence, 33(5):898–916.
  8. Kernel independent component analysis. The Journal of Machine Learning Research, 3:1–48.
  9. Predictive low-rank decomposition for kernel methods. In Proceedings of the 22nd international conference on Machine learning, pages 33–40. ACM.
  10. Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering. In Thrun, S., Saul, L. K., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference, volume 16. MIT Press.
  11. An updated set of basic linear algebra subprograms (blas). ACM Transactions on Mathematical Software, 28(2):135–151.
  12. Convex optimization. Cambridge university press.
  13. Burges, C. J. et al. (1996). Simplified support vector decision rules. In Saitta, L., editor, Proceedings of the Thirteenth International Conference on Machine Learning, volume 96, pages 71–77. Morgan Kaufmann.
  14. Chung, F. R. K. (1997). Spectral graph theory, volume 92. American Mathematical Soc.
  15. Two-step spectral clustering controlled islanding algorithm. IEEE Transactions on Power Systems, 28(1):75–84.
  16. Efficient SVM training using low-rank kernel representations. The Journal of Machine Learning Research, 2:243–264.
  17. Spectral grouping using the Nyström method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2):214–225.
  18. Sparse spectral clustering method based on the incomplete cholesky decomposition. Journal of Computational and Applied Mathematics, 237(1):145–161.
  19. Girolami, M. (2002). Orthogonal series density estimation and the kernel eigenvalue problem. Neural Computation, 14(3):669–688.
  20. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 11(9):1074–1085.
  21. Spectral clustering and its use in bioinformatics. Journal of computational and applied mathematics, 204(1):25–37.
  22. Comparing partitions. Journal of classification, 2(1):193–218.
  23. INTEL MKL (2020). Intel Math Kernel Library (MKL). https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html.
  24. Soft kernel spectral clustering. In Proc. IJCNN’13 International Joint Conference on Neural Networks, pages 1028–1035.
  25. Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215–237.
  26. Global spectral clustering in dynamic networks. Proceedings of the National Academy of Sciences, 115(5):927–932.
  27. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int’l Conf. Computer Vision, volume 2, pages 416–423.
  28. A random walks view of spectral segmentation. In International Workshop on Artificial Intelligence and Statistics, pages 203–208. PMLR.
  29. On spectral clustering analysis and an algorithm. Proceedings of Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 14:849–856.
  30. Novák, M. (2021a). mnovak42/leuven: The Leuven library and framework. accessed on 1 December 2022, https://leuven.readthedocs.io/en/latest/ https://github.com/mnovak42/leuven.
  31. Novák, M. (2021b). Sparse Kernel Spectral Clustering Applications - documentation and user guide. accessed on 1 December 2022, https://leuven-ksc.readthedocs.io/en/latest/.
  32. NVIDIA (2020). CUDA, cuBLAS, cuSOLVER. https://developer.nvidia.com/cuda-toolkit,https://developer.nvidia.com/cublas,https://developer.nvidia.com/cusolver.
  33. Nyström, E. J. (1930). Über die praktische auflösung von integralgleichungen mit anwendungen auf randwertaufgaben. Acta Mathematica, 54(1):185–204.
  34. Non-parametric similarity measures for unsupervised texture segmentation and image retrieval. In Proc. Computer Vision and Pattern Recognition (CVPR), pages 267–272.
  35. Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10(5):1000–1017.
  36. Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5):1299–1319.
  37. Kernel methods for pattern analysis. Cambridge university press.
  38. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905.
  39. Smola, A. J. (2000). Sparse greedy matrix approximation for machine learning. In Proceedings of the 17th international conference on machine learning, June 29-July 2 2000. Morgan Kaufmann.
  40. The identification of functional modules from the genomic association of genes. Proceedings of the National Academy of Sciences, 99(9):5890–5895.
  41. Least Squares Support Vector Machines. World Scientific, Singapore.
  42. A support vector machine formulation to pca analysis and its kernel version. IEEE Transactions on Neural Networks, 14(2):447–450.
  43. von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416.
  44. The effect of the input density distribution on kernel-based classifiers. In Proceedings of the 17th International Conference on Machine Learning, pages 1159–1166.
  45. Wright, S. J. (1999). Modified Cholesky factorizations in interior-point algorithms for linear programming. SIAM Journal on Optimization, 9(4):1159–1191.
  46. Spectral clustering ensemble applied to sar image segmentation. IEEE Transactions on Geoscience and Remote Sensing, 46(7):2126–2136.

Summary

We haven't generated a summary for this paper yet.