Block-Diagonal Guided DBSCAN Clustering (2404.01341v2)
Abstract: Cluster analysis plays a crucial role in database mining, and one of the most widely used algorithms in this field is DBSCAN. However, DBSCAN has several limitations, such as difficulty in handling high-dimensional large-scale data, sensitivity to input parameters, and lack of robustness in producing clustering results. This paper introduces an improved version of DBSCAN that leverages the block-diagonal property of the similarity graph to guide the clustering procedure of DBSCAN. The key idea is to construct a graph that measures the similarity between high-dimensional large-scale data points and has the potential to be transformed into a block-diagonal form through an unknown permutation, followed by a cluster-ordering procedure to generate the desired permutation. The clustering structure can be easily determined by identifying the diagonal blocks in the permuted graph. We propose a gradient descent-based method to solve the proposed problem. Additionally, we develop a DBSCAN-based points traversal algorithm that identifies clusters with high densities in the graph and generates an augmented ordering of clusters. The block-diagonal structure of the graph is then achieved through permutation based on the traversal order, providing a flexible foundation for both automatic and interactive cluster analysis. We introduce a split-and-refine algorithm to automatically search for all diagonal blocks in the permuted graph with theoretically optimal guarantees under specific cases. We extensively evaluate our proposed approach on twelve challenging real-world benchmark clustering datasets and demonstrate its superior performance compared to the state-of-the-art clustering method on every dataset.
- A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM computing surveys (CSUR), vol. 31, no. 3, pp. 264–323, 1999.
- D. Xu and Y. Tian, “A comprehensive survey of clustering algorithms,” Annals of Data Science, vol. 2, no. 2, pp. 165–193, 2015.
- Z. Xing and W. Zhao, “Unsupervised action segmentation via fast learning of semantically consistent actoms,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 6, 2024, pp. 6270–6278.
- C. Fraley and A. E. Raftery, “How many clusters? which clustering method? answers via model-based cluster analysis,” The computer journal, vol. 41, no. 8, pp. 578–588, 1998.
- J. MacQueen, “Classification and analysis of multivariate observations,” in 5th Berkeley Symp. Math. Statist. Probability, 1967, pp. 281–297.
- V. Menon, G. Muthukrishnan, and S. Kalyani, “Subspace clustering without knowing the number of clusters: A parameter free approach,” IEEE Trans. Signal Process., vol. 68, pp. 5047–5062, 2020.
- Y. Zhang, S. Ding, L. Wang, Y. Wang, and L. Ding, “Chameleon algorithm based on mutual k-nearest neighbors,” Applied Intelligence, vol. 51, no. 4, pp. 2031–2044, 2021.
- W. Sun and Q. Du, “Graph-regularized fast and robust principal component analysis for hyperspectral band selection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 6, pp. 3185–3195, 2018.
- T. Qiu and Y.-J. Li, “Fast ldp-mst: An efficient density-peak-based clustering method for large-size datasets,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 5, pp. 4767–4780, 2022.
- U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and computing, vol. 17, no. 4, pp. 395–416, 2007.
- L. Bai, J. Liang, and Y. Zhao, “Self-constrained spectral clustering,” IEEE Trans. Pattern Anal. Mach. Intell., 2022.
- M. Ester, H.-P. Kriegel, J. Sander, X. Xu et al., “A density-based algorithm for discovering clusters in large spatial databases with noise.” in kdd, vol. 96, no. 34, 1996, pp. 226–231.
- R. J. Campello, D. Moulavi, and J. Sander, “Density-based clustering based on hierarchical density estimates,” in Pacific-Asia conference on knowledge discovery and data mining, 2013, pp. 160–172.
- D. Birant and A. Kut, “St-dbscan: An algorithm for clustering spatial–temporal data,” Data & knowledge engineering, vol. 60, no. 1, pp. 208–221, 2007.
- A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, “From few to many: Illumination cone models for face recognition under variable lighting and pose,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6, pp. 643–660, 2001.
- T. Zhou, H. Fu, C. Gong, L. Shao, F. Porikli, H. Ling, and J. Shen, “Consistency and diversity induced human motion segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 197–210, 2022.
- Y. Qin, X. Zhang, L. Shen, and G. Feng, “Maximum block energy guided robust subspace clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, 2022.
- Y. Bai, L. Wang, Y. Liu, Y. Yin, H. Di, and Y. Fu, “Human motion segmentation via velocity-sensitive dual-side auto-encoder,” IEEE Trans. Image Process., vol. 32, pp. 524 – 536, 2022.
- X. Wang, D. Guo, and P. Cheng, “Support structure representation learning for sequential data clustering,” Pattern Recognition, vol. 122, p. 108326, 2022.
- M. A. Patwary, D. Palsetia, A. Agrawal, W.-k. Liao, F. Manne, and A. Choudhary, “Scalable parallel optics data clustering using graph algorithmic techniques,” in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2013, pp. 1–12.
- X. Yang, C. Deng, F. Zheng, J. Yan, and W. Liu, “Deep spectral clustering using dual autoencoder network,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4066–4075.
- H. Jiang, “Density level set estimation on manifolds with DBSCAN,” in International Conference on Machine Learning, 2017, pp. 1684–1693.
- H. Jiang and J. Jang, “Faster dbscan via subsampled similarity queries,” 2020.
- M. M. A. Patwary, D. Palsetia, A. Agrawal, W.-k. Liao, F. Manne, and A. Choudhary, “A new scalable parallel DBSCAN algorithm using the disjoint-set data structure,” in SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2012, pp. 1–11.
- M. M. A. Patwary, N. Satish, N. Sundaram, F. Manne, S. Habib, and P. Dubey, “Pardicle: Parallel approximate density-based clustering,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’14. Piscataway, NJ, USA: IEEE Press, 2014, pp. 560–571. [Online]. Available: http://dx.doi.org/10.1109/SC.2014.51
- M. Muja and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 11, pp. 2227–2240, 2014.
- A. Sarma, P. Goyal, S. Kumari, A. Wani, J. S. Challa, S. Islam, and N. Goyal, “μ𝜇\muitalic_μDBSCAN: An exact scalable DBSCAN algorithm for big data exploiting spatial locality,” in 2019 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2019, pp. 1–11.
- A. Sharma and A. Sharma, “KNN-DBSCAN: Using k-nearest neighbor information for parameter-free density based clustering,” in 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), 2017, pp. 787–792.
- M. M. A. Patwary, S. Byna, N. R. Satish, N. Sundaram, Z. Lukić, V. Roytershteyn, M. J. Anderson, Y. Yao, Prabhat, and P. Dubey, “BD-CATS: Big data clustering at trillion particle scale,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC’ 15. New York, NY, USA: ACM, 2015, pp. 6:1–6:12. [Online]. Available: http://doi.acm.org/10.1145/2807591.2807616
- B. Welton, E. Samanas, and B. P. Miller, “Extreme scale density-based clustering using a tree-based network of gpgpu nodes,” in SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2013, pp. 1–11.
- J. Feng, Z. Lin, H. Xu, and S. Yan, “Robust subspace segmentation with block-diagonal prior,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 3818–3825.
- F. Wu, Y. Hu, J. Gao, Y. Sun, and B. Yin, “Ordered subspace clustering with block-diagonal priors,” IEEE Trans. Cybern., vol. 46, no. 12, pp. 3209–3219, 2015.
- M. Lee, J. Lee, H. Lee, and N. Kwak, “Membership representation for detecting block-diagonal structure in low-rank or sparse subspace clustering,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1648–1656.
- X. Xie, X. Guo, G. Liu, and J. Wang, “Implicit block diagonal low-rank representation,” IEEE Trans. Image Process., vol. 27, no. 1, pp. 477–489, 2017.
- C. Lu, J. Feng, Z. Lin, T. Mei, and S. Yan, “Subspace clustering by block diagonal representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 2, pp. 487–501, 2018.
- C. Yang, Z. Ren, Q. Sun, M. Wu, M. Yin, and Y. Sun, “Joint correntropy metric weighting and block diagonal regularizer for robust multiple kernel subspace clustering,” Information Sciences, vol. 500, pp. 48–66, 2019.
- M. Liu, Y. Wang, J. Sun, and Z. Ji, “Structured block diagonal representation for subspace clustering,” Applied Intelligence, vol. 50, pp. 2523–2536, 2020.
- L. Wang, J. Huang, M. Yin, R. Cai, and Z. Hao, “Block diagonal representation learning for robust subspace clustering,” Information Sciences, vol. 526, pp. 54–67, 2020.
- Y. Qin, G. Feng, Y. Ren, and X. Zhang, “Block-diagonal guided symmetric nonnegative matrix factorization,” IEEE Trans. Knowl. Data Eng., 2021.
- Y. Lin and S. Chen, “Convex subspace clustering by adaptive block diagonal representation,” IEEE Trans. Neural Netw. Learn. Syst., 2022.
- M. Liu, Y. Wang, J. Sun, and Z. Ji, “Adaptive low-rank kernel block diagonal representation subspace clustering,” Applied Intelligence, vol. 52, no. 2, pp. 2301–2316, 2022.
- Y. Qin, H. Wu, J. Zhao, and G. Feng, “Enforced block diagonal subspace clustering with closed form solution,” Pattern Recognition, vol. 130, p. 108791, 2022.
- W. Sun, J. Peng, G. Yang, and Q. Du, “Correntropy-based sparse spectral clustering for hyperspectral band selection,” IEEE Geoscience and Remote Sensing Letters, vol. 17, no. 3, pp. 484–488, 2019.
- Y. Xu, S. Chen, J. Li, C. Xu, and J. Yang, “Fast subspace clustering by learning projective block diagonal representation,” Pattern Recognition, vol. 135, p. 109152, 2023.
- X. Li, Y. Sun, Q. Sun, and Z. Ren, “Enforced block diagonal graph learning for multikernel clustering,” IEEE Trans. Comput. Soc. Syst., 2023.
- Z. Kong, D. Chang, Z. Fu, J. Wang, Y. Wang, and Y. Zhao, “Projection-preserving block-diagonal low-rank representation for subspace clustering,” Neurocomputing, vol. 526, pp. 19–29, 2023.
- S. Li, Z. Liu, L. Fang, and Q. Li, “Block diagonal representation learning for hyperspectral band selection,” IEEE Trans. Geosci. Remote Sens., 2023.
- M. Yin, W. Liu, M. Li, T. Jin, and R. Ji, “Cauchy loss induced block diagonal representation for robust multi-view subspace clustering,” Neurocomputing, vol. 427, pp. 84–95, 2021.
- M. Liu, Y. Wang, V. Palade, and Z. Ji, “Multi-view subspace clustering network with block diagonal and diverse representation,” Information Sciences, vol. 626, pp. 149–165, 2023.
- Z. Xie and L. Wang, “Active block diagonal subspace clustering,” IEEE Access, vol. 9, pp. 83 976–83 992, 2021.
- A. Taştan, M. Muma, and A. M. Zoubir, “Fast and robust sparsity-aware block diagonal representation,” IEEE Trans. Signal Process., 2023.
- X. Zhang, X. Xue, H. Sun, Z. Liu, L. Guo, and X. Guo, “Robust multiple kernel subspace clustering with block diagonal representation and low-rank consensus kernel,” Knowledge-Based Systems, vol. 227, p. 107243, 2021.
- L. Fan, G. Lu, T. Liu, and Y. Wang, “Block diagonal least squares regression for subspace clustering,” Electronics, vol. 11, no. 15, p. 2375, 2022.
- Y. Xu, S. Chen, J. Li, Z. Han, and J. Yang, “Autoencoder-based latent block-diagonal representation for subspace clustering,” IEEE Trans. Cybern., vol. 52, no. 6, pp. 5408–5418, 2020.
- J. Liu, Y. Sun, and Y. Hu, “Deep subspace clustering with block diagonal constraint,” Applied Sciences, vol. 10, no. 24, p. 8942, 2020.
- J. Liu, X. Liu, Y. Zhang, P. Zhang, W. Tu, S. Wang, S. Zhou, W. Liang, S. Wang, and Y. Yang, “Self-representation subspace clustering for incomplete multi-view data,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 2726–2734.
- J. Zhang, C.-G. Li, T. Du, H. Zhang, and J. Guo, “Convolutional subspace clustering network with block diagonal prior,” IEEE Access, vol. 8, pp. 5723–5732, 2019.
- S. Li and H. Mehrabadi, “Generation of block diagonal forms using hierarchical clustering for cell formation problems,” Procedia CIRP, vol. 17, pp. 44–49, 2014.
- R. Fu and Z. Li, “An evidence accumulation based block diagonal cluster model for intent recognition from eeg,” Biomedical Signal Processing and Control, vol. 77, p. 103835, 2022.
- C. Chen and S. Irani, “Cluster first-sequence last heuristics for generating block diagonal forms for a machine-part matrix,” THE INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, vol. 31, no. 11, pp. 2623–2647, 1993.
- J. Chen and J. G. Dy, “A generative block-diagonal model for clustering,” in UAI, 2016.
- C. Chen, J. Wei, and Z. Li, “Multiple kernel k-means clustering with block diagonal property,” Pattern Analysis and Applications, vol. 26, no. 3, pp. 1515–1526, 2023.
- S.-S. Choi, S.-H. Cha, and C. C. Tappert, “A survey of binary similarity and distance measures,” Journal of systemics, cybernetics and informatics, vol. 8, no. 1, pp. 43–48, 2010.
- W. Sun, J. Peng, G. Yang, and Q. Du, “Fast and latent low-rank subspace clustering for hyperspectral band selection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 6, pp. 3906–3915, 2020.
- E. G. Birgin, J. M. Martínez, and M. Raydan, “Nonmonotone spectral projected gradient methods on convex sets,” SIAM Journal on Optimization, vol. 10, no. 4, pp. 1196–1211, 2000.
- J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905, 2000.
- N. Mrabah, M. Bouguessa, M. F. Touati, and R. Ksantini, “Rethinking graph auto-encoder models for attributed graph clustering,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 5, pp. 5076–5090, 2022.
- S. Tierney, J. Gao, and Y. Guo, “Subspace clustering for sequential data,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1019–1026.
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
- S. A. Nene, S. K. Nayar, H. Murase et al., “Columbia object image library (coil-100),” 1996.
- A. Coates, A. Ng, and H. Lee, “An analysis of single-layer networks in unsupervised feature learning,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 215–223.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, 2009, pp. 248–255.
- K. Lang, “Newsweeder: Learning to filter netnews,” in Machine learning proceedings 1995, 1995, pp. 331–339.
- D. D. Lewis, Y. Yang, T. Russell-Rose, and F. Li, “Rcv1: A new benchmark collection for text categorization research,” Journal of machine learning research, vol. 5, no. Apr, pp. 361–397, 2004.
- J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for clustering analysis,” in International conference on machine learning, 2016, pp. 478–487.
- J. F. Gemmeke, D. P. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human-labeled dataset for audio events,” in 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2017, pp. 776–780.