Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Block-Diagonal Guided DBSCAN Clustering (2404.01341v2)

Published 31 Mar 2024 in cs.LG, cs.AI, and cs.DS

Abstract: Cluster analysis plays a crucial role in database mining, and one of the most widely used algorithms in this field is DBSCAN. However, DBSCAN has several limitations, such as difficulty in handling high-dimensional large-scale data, sensitivity to input parameters, and lack of robustness in producing clustering results. This paper introduces an improved version of DBSCAN that leverages the block-diagonal property of the similarity graph to guide the clustering procedure of DBSCAN. The key idea is to construct a graph that measures the similarity between high-dimensional large-scale data points and has the potential to be transformed into a block-diagonal form through an unknown permutation, followed by a cluster-ordering procedure to generate the desired permutation. The clustering structure can be easily determined by identifying the diagonal blocks in the permuted graph. We propose a gradient descent-based method to solve the proposed problem. Additionally, we develop a DBSCAN-based points traversal algorithm that identifies clusters with high densities in the graph and generates an augmented ordering of clusters. The block-diagonal structure of the graph is then achieved through permutation based on the traversal order, providing a flexible foundation for both automatic and interactive cluster analysis. We introduce a split-and-refine algorithm to automatically search for all diagonal blocks in the permuted graph with theoretically optimal guarantees under specific cases. We extensively evaluate our proposed approach on twelve challenging real-world benchmark clustering datasets and demonstrate its superior performance compared to the state-of-the-art clustering method on every dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM computing surveys (CSUR), vol. 31, no. 3, pp. 264–323, 1999.
  2. D. Xu and Y. Tian, “A comprehensive survey of clustering algorithms,” Annals of Data Science, vol. 2, no. 2, pp. 165–193, 2015.
  3. Z. Xing and W. Zhao, “Unsupervised action segmentation via fast learning of semantically consistent actoms,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 6, 2024, pp. 6270–6278.
  4. C. Fraley and A. E. Raftery, “How many clusters? which clustering method? answers via model-based cluster analysis,” The computer journal, vol. 41, no. 8, pp. 578–588, 1998.
  5. J. MacQueen, “Classification and analysis of multivariate observations,” in 5th Berkeley Symp. Math. Statist. Probability, 1967, pp. 281–297.
  6. V. Menon, G. Muthukrishnan, and S. Kalyani, “Subspace clustering without knowing the number of clusters: A parameter free approach,” IEEE Trans. Signal Process., vol. 68, pp. 5047–5062, 2020.
  7. Y. Zhang, S. Ding, L. Wang, Y. Wang, and L. Ding, “Chameleon algorithm based on mutual k-nearest neighbors,” Applied Intelligence, vol. 51, no. 4, pp. 2031–2044, 2021.
  8. W. Sun and Q. Du, “Graph-regularized fast and robust principal component analysis for hyperspectral band selection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 6, pp. 3185–3195, 2018.
  9. T. Qiu and Y.-J. Li, “Fast ldp-mst: An efficient density-peak-based clustering method for large-size datasets,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 5, pp. 4767–4780, 2022.
  10. U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and computing, vol. 17, no. 4, pp. 395–416, 2007.
  11. L. Bai, J. Liang, and Y. Zhao, “Self-constrained spectral clustering,” IEEE Trans. Pattern Anal. Mach. Intell., 2022.
  12. M. Ester, H.-P. Kriegel, J. Sander, X. Xu et al., “A density-based algorithm for discovering clusters in large spatial databases with noise.” in kdd, vol. 96, no. 34, 1996, pp. 226–231.
  13. R. J. Campello, D. Moulavi, and J. Sander, “Density-based clustering based on hierarchical density estimates,” in Pacific-Asia conference on knowledge discovery and data mining, 2013, pp. 160–172.
  14. D. Birant and A. Kut, “St-dbscan: An algorithm for clustering spatial–temporal data,” Data & knowledge engineering, vol. 60, no. 1, pp. 208–221, 2007.
  15. A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, “From few to many: Illumination cone models for face recognition under variable lighting and pose,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6, pp. 643–660, 2001.
  16. T. Zhou, H. Fu, C. Gong, L. Shao, F. Porikli, H. Ling, and J. Shen, “Consistency and diversity induced human motion segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 197–210, 2022.
  17. Y. Qin, X. Zhang, L. Shen, and G. Feng, “Maximum block energy guided robust subspace clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, 2022.
  18. Y. Bai, L. Wang, Y. Liu, Y. Yin, H. Di, and Y. Fu, “Human motion segmentation via velocity-sensitive dual-side auto-encoder,” IEEE Trans. Image Process., vol. 32, pp. 524 – 536, 2022.
  19. X. Wang, D. Guo, and P. Cheng, “Support structure representation learning for sequential data clustering,” Pattern Recognition, vol. 122, p. 108326, 2022.
  20. M. A. Patwary, D. Palsetia, A. Agrawal, W.-k. Liao, F. Manne, and A. Choudhary, “Scalable parallel optics data clustering using graph algorithmic techniques,” in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2013, pp. 1–12.
  21. X. Yang, C. Deng, F. Zheng, J. Yan, and W. Liu, “Deep spectral clustering using dual autoencoder network,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4066–4075.
  22. H. Jiang, “Density level set estimation on manifolds with DBSCAN,” in International Conference on Machine Learning, 2017, pp. 1684–1693.
  23. H. Jiang and J. Jang, “Faster dbscan via subsampled similarity queries,” 2020.
  24. M. M. A. Patwary, D. Palsetia, A. Agrawal, W.-k. Liao, F. Manne, and A. Choudhary, “A new scalable parallel DBSCAN algorithm using the disjoint-set data structure,” in SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2012, pp. 1–11.
  25. M. M. A. Patwary, N. Satish, N. Sundaram, F. Manne, S. Habib, and P. Dubey, “Pardicle: Parallel approximate density-based clustering,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’14.   Piscataway, NJ, USA: IEEE Press, 2014, pp. 560–571. [Online]. Available: http://dx.doi.org/10.1109/SC.2014.51
  26. M. Muja and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 11, pp. 2227–2240, 2014.
  27. A. Sarma, P. Goyal, S. Kumari, A. Wani, J. S. Challa, S. Islam, and N. Goyal, “μ𝜇\muitalic_μDBSCAN: An exact scalable DBSCAN algorithm for big data exploiting spatial locality,” in 2019 IEEE International Conference on Cluster Computing (CLUSTER).   IEEE, 2019, pp. 1–11.
  28. A. Sharma and A. Sharma, “KNN-DBSCAN: Using k-nearest neighbor information for parameter-free density based clustering,” in 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), 2017, pp. 787–792.
  29. M. M. A. Patwary, S. Byna, N. R. Satish, N. Sundaram, Z. Lukić, V. Roytershteyn, M. J. Anderson, Y. Yao, Prabhat, and P. Dubey, “BD-CATS: Big data clustering at trillion particle scale,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC’ 15.   New York, NY, USA: ACM, 2015, pp. 6:1–6:12. [Online]. Available: http://doi.acm.org/10.1145/2807591.2807616
  30. B. Welton, E. Samanas, and B. P. Miller, “Extreme scale density-based clustering using a tree-based network of gpgpu nodes,” in SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2013, pp. 1–11.
  31. J. Feng, Z. Lin, H. Xu, and S. Yan, “Robust subspace segmentation with block-diagonal prior,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 3818–3825.
  32. F. Wu, Y. Hu, J. Gao, Y. Sun, and B. Yin, “Ordered subspace clustering with block-diagonal priors,” IEEE Trans. Cybern., vol. 46, no. 12, pp. 3209–3219, 2015.
  33. M. Lee, J. Lee, H. Lee, and N. Kwak, “Membership representation for detecting block-diagonal structure in low-rank or sparse subspace clustering,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1648–1656.
  34. X. Xie, X. Guo, G. Liu, and J. Wang, “Implicit block diagonal low-rank representation,” IEEE Trans. Image Process., vol. 27, no. 1, pp. 477–489, 2017.
  35. C. Lu, J. Feng, Z. Lin, T. Mei, and S. Yan, “Subspace clustering by block diagonal representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 2, pp. 487–501, 2018.
  36. C. Yang, Z. Ren, Q. Sun, M. Wu, M. Yin, and Y. Sun, “Joint correntropy metric weighting and block diagonal regularizer for robust multiple kernel subspace clustering,” Information Sciences, vol. 500, pp. 48–66, 2019.
  37. M. Liu, Y. Wang, J. Sun, and Z. Ji, “Structured block diagonal representation for subspace clustering,” Applied Intelligence, vol. 50, pp. 2523–2536, 2020.
  38. L. Wang, J. Huang, M. Yin, R. Cai, and Z. Hao, “Block diagonal representation learning for robust subspace clustering,” Information Sciences, vol. 526, pp. 54–67, 2020.
  39. Y. Qin, G. Feng, Y. Ren, and X. Zhang, “Block-diagonal guided symmetric nonnegative matrix factorization,” IEEE Trans. Knowl. Data Eng., 2021.
  40. Y. Lin and S. Chen, “Convex subspace clustering by adaptive block diagonal representation,” IEEE Trans. Neural Netw. Learn. Syst., 2022.
  41. M. Liu, Y. Wang, J. Sun, and Z. Ji, “Adaptive low-rank kernel block diagonal representation subspace clustering,” Applied Intelligence, vol. 52, no. 2, pp. 2301–2316, 2022.
  42. Y. Qin, H. Wu, J. Zhao, and G. Feng, “Enforced block diagonal subspace clustering with closed form solution,” Pattern Recognition, vol. 130, p. 108791, 2022.
  43. W. Sun, J. Peng, G. Yang, and Q. Du, “Correntropy-based sparse spectral clustering for hyperspectral band selection,” IEEE Geoscience and Remote Sensing Letters, vol. 17, no. 3, pp. 484–488, 2019.
  44. Y. Xu, S. Chen, J. Li, C. Xu, and J. Yang, “Fast subspace clustering by learning projective block diagonal representation,” Pattern Recognition, vol. 135, p. 109152, 2023.
  45. X. Li, Y. Sun, Q. Sun, and Z. Ren, “Enforced block diagonal graph learning for multikernel clustering,” IEEE Trans. Comput. Soc. Syst., 2023.
  46. Z. Kong, D. Chang, Z. Fu, J. Wang, Y. Wang, and Y. Zhao, “Projection-preserving block-diagonal low-rank representation for subspace clustering,” Neurocomputing, vol. 526, pp. 19–29, 2023.
  47. S. Li, Z. Liu, L. Fang, and Q. Li, “Block diagonal representation learning for hyperspectral band selection,” IEEE Trans. Geosci. Remote Sens., 2023.
  48. M. Yin, W. Liu, M. Li, T. Jin, and R. Ji, “Cauchy loss induced block diagonal representation for robust multi-view subspace clustering,” Neurocomputing, vol. 427, pp. 84–95, 2021.
  49. M. Liu, Y. Wang, V. Palade, and Z. Ji, “Multi-view subspace clustering network with block diagonal and diverse representation,” Information Sciences, vol. 626, pp. 149–165, 2023.
  50. Z. Xie and L. Wang, “Active block diagonal subspace clustering,” IEEE Access, vol. 9, pp. 83 976–83 992, 2021.
  51. A. Taştan, M. Muma, and A. M. Zoubir, “Fast and robust sparsity-aware block diagonal representation,” IEEE Trans. Signal Process., 2023.
  52. X. Zhang, X. Xue, H. Sun, Z. Liu, L. Guo, and X. Guo, “Robust multiple kernel subspace clustering with block diagonal representation and low-rank consensus kernel,” Knowledge-Based Systems, vol. 227, p. 107243, 2021.
  53. L. Fan, G. Lu, T. Liu, and Y. Wang, “Block diagonal least squares regression for subspace clustering,” Electronics, vol. 11, no. 15, p. 2375, 2022.
  54. Y. Xu, S. Chen, J. Li, Z. Han, and J. Yang, “Autoencoder-based latent block-diagonal representation for subspace clustering,” IEEE Trans. Cybern., vol. 52, no. 6, pp. 5408–5418, 2020.
  55. J. Liu, Y. Sun, and Y. Hu, “Deep subspace clustering with block diagonal constraint,” Applied Sciences, vol. 10, no. 24, p. 8942, 2020.
  56. J. Liu, X. Liu, Y. Zhang, P. Zhang, W. Tu, S. Wang, S. Zhou, W. Liang, S. Wang, and Y. Yang, “Self-representation subspace clustering for incomplete multi-view data,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 2726–2734.
  57. J. Zhang, C.-G. Li, T. Du, H. Zhang, and J. Guo, “Convolutional subspace clustering network with block diagonal prior,” IEEE Access, vol. 8, pp. 5723–5732, 2019.
  58. S. Li and H. Mehrabadi, “Generation of block diagonal forms using hierarchical clustering for cell formation problems,” Procedia CIRP, vol. 17, pp. 44–49, 2014.
  59. R. Fu and Z. Li, “An evidence accumulation based block diagonal cluster model for intent recognition from eeg,” Biomedical Signal Processing and Control, vol. 77, p. 103835, 2022.
  60. C. Chen and S. Irani, “Cluster first-sequence last heuristics for generating block diagonal forms for a machine-part matrix,” THE INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, vol. 31, no. 11, pp. 2623–2647, 1993.
  61. J. Chen and J. G. Dy, “A generative block-diagonal model for clustering,” in UAI, 2016.
  62. C. Chen, J. Wei, and Z. Li, “Multiple kernel k-means clustering with block diagonal property,” Pattern Analysis and Applications, vol. 26, no. 3, pp. 1515–1526, 2023.
  63. S.-S. Choi, S.-H. Cha, and C. C. Tappert, “A survey of binary similarity and distance measures,” Journal of systemics, cybernetics and informatics, vol. 8, no. 1, pp. 43–48, 2010.
  64. W. Sun, J. Peng, G. Yang, and Q. Du, “Fast and latent low-rank subspace clustering for hyperspectral band selection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 6, pp. 3906–3915, 2020.
  65. E. G. Birgin, J. M. Martínez, and M. Raydan, “Nonmonotone spectral projected gradient methods on convex sets,” SIAM Journal on Optimization, vol. 10, no. 4, pp. 1196–1211, 2000.
  66. J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905, 2000.
  67. N. Mrabah, M. Bouguessa, M. F. Touati, and R. Ksantini, “Rethinking graph auto-encoder models for attributed graph clustering,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 5, pp. 5076–5090, 2022.
  68. S. Tierney, J. Gao, and Y. Guo, “Subspace clustering for sequential data,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1019–1026.
  69. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  70. A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
  71. S. A. Nene, S. K. Nayar, H. Murase et al., “Columbia object image library (coil-100),” 1996.
  72. A. Coates, A. Ng, and H. Lee, “An analysis of single-layer networks in unsupervised feature learning,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 215–223.
  73. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, 2009, pp. 248–255.
  74. K. Lang, “Newsweeder: Learning to filter netnews,” in Machine learning proceedings 1995, 1995, pp. 331–339.
  75. D. D. Lewis, Y. Yang, T. Russell-Rose, and F. Li, “Rcv1: A new benchmark collection for text categorization research,” Journal of machine learning research, vol. 5, no. Apr, pp. 361–397, 2004.
  76. J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for clustering analysis,” in International conference on machine learning, 2016, pp. 478–487.
  77. J. F. Gemmeke, D. P. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human-labeled dataset for audio events,” in 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2017, pp. 776–780.
Citations (1)

Summary

We haven't generated a summary for this paper yet.