CDC: A Simple Framework for Complex Data Clustering
Abstract: In today's data-driven digital era, the amount as well as complexity, such as multi-view, non-Euclidean, and multi-relational, of the collected data are growing exponentially or even faster. Clustering, which unsupervisely extracts valid knowledge from data, is extremely useful in practice. However, existing methods are independently developed to handle one particular challenge at the expense of the others. In this work, we propose a simple but effective framework for complex data clustering (CDC) that can efficiently process different types of data with linear complexity. We first utilize graph filtering to fuse geometry structure and attribute information. We then reduce the complexity with high-quality anchors that are adaptively learned via a novel similarity-preserving regularizer. We illustrate the cluster-ability of our proposed method theoretically and experimentally. In particular, we deploy CDC to graph data of size 111M.
- A. K. Jain, “Data clustering: 50 years beyond k-means,” Pattern recognition letters, vol. 31, no. 8, pp. 651–666, 2010.
- J. Zhao, X. Xie, X. Xu, and S. Sun, “Multi-view learning overview: Recent progress and new challenges,” Information Fusion, vol. 38, pp. 43–54, 2017.
- K. Zhan, F. Nie, J. Wang, and Y. Yang, “Multiview consensus graph clustering,” IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1261–1270, 2018.
- X. Yang, C. Deng, Z. Dang, and D. Tao, “Deep multiview collaborative clustering,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
- Z. Kang, W. Zhou, Z. Zhao, J. Shao, M. Han, and Z. Xu, “Large-scale multi-view subspace clustering in linear time,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 04, 2020, pp. 4412–4419.
- X. Li, H. Zhang, R. Wang, and F. Nie, “Multiview clustering: A scalable and parameter-free bipartite graph fusion method,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 330–344, 2020.
- S. Liu, S. Wang, P. Zhang, K. Xu, X. Liu, C. Zhang, and F. Gao, “Efficient one-pass multi-view subspace clustering with consensus anchors,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 7, 2022, pp. 7576–7584.
- M. Sun, P. Zhang, S. Wang, S. Zhou, W. Tu, X. Liu, E. Zhu, and C. Wang, “Scalable multi-view subspace clustering with unified anchors,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 3528–3536.
- Z. Zhang, L. Liu, F. Shen, H. T. Shen, and L. Shao, “Binary multi-view clustering,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 7, pp. 1774–1782, 2018.
- X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang, “Community preserving network embedding,” in Proceedings of the AAAI conference on artificial intelligence, vol. 31, no. 1, 2017.
- E. Pan and Z. Kang, “Multi-view contrastive graph clustering,” Advances in neural information processing systems, vol. 34, pp. 2148–2159, 2021.
- T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in 5th International Conference on Learning Representations, 2017.
- X. Zhang, H. Liu, Q. Li, and X. Wu, “Attributed graph clustering via adaptive graph convolution,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, S. Kraus, Ed., 2019, pp. 4327–4333.
- C. Wang, S. Pan, R. Hu, G. Long, J. Jiang, and C. Zhang, “Attributed graph clustering: A deep attentional embedding approach,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, 2019, pp. 3670–3676.
- S. Pan, R. Hu, S.-f. Fung, G. Long, J. Jiang, and C. Zhang, “Learning graph embedding with adversarial training methods,” IEEE transactions on cybernetics, vol. 50, no. 6, pp. 2475–2487, 2019.
- J. Cheng, Q. Wang, Z. Tao, D. Xie, and Q. Gao, “Multi-view attribute graph convolution networks for clustering,” in Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, 2021, pp. 2973–2979.
- S. Fan, X. Wang, C. Shi, E. Lu, K. Lin, and B. Wang, “One2multi graph autoencoder for multi-view graph clustering,” in Proceedings of The Web Conference 2020, 2020, pp. 3070–3076.
- F. Devvrit, A. Sinha, I. S. Dhillon, and P. Jain, “S3GC: Scalable self-supervised graph clustering,” in Advances in Neural Information Processing Systems, 2022.
- Y. Liu, K. Liang, J. Xia, S. Zhou, X. Yang, X. Liu, and S. Z. Li, “Dink-net: Neural clustering on large graphs,” in International Conference on Machine Learning, ICML 2023. PMLR, 2023.
- D. J. Trosten, S. Lokse, R. Jenssen, and M. Kampffmeyer, “Reconsidering representation alignment for multi-view clustering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1255–1265.
- R. Li, C. Zhang, Q. Hu, P. Zhu, and Z. Wang, “Flexible multi-view representation learning for subspace clustering.” pp. 2916–2922, 2019.
- S. Mitra, M. Hasanuzzaman, and S. Saha, “A unified multi-view clustering algorithm using multi-objective optimization coupled with generative model,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 14, no. 1, pp. 1–31, 2020.
- X. Peng, Z. Huang, J. Lv, H. Zhu, and J. T. Zhou, “Comic: Multi-view clustering without parameter selection,” in International conference on machine learning. PMLR, 2019, pp. 5092–5101.
- Y. Wang, D. Chang, Z. Fu, and Y. Zhao, “Consistent multiple graph embedding for multi-view clustering,” IEEE Transactions on Multimedia, 2021.
- Y. Lin, Y. Gou, X. Liu, J. Bai, J. Lv, and X. Peng, “Dual contrastive prediction for incomplete multi-view representation learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- Z. Kang, Z. Lin, X. Zhu, and W. Xu, “Structured graph learning for scalable subspace clustering: From single view to multiview,” IEEE Transactions on Cybernetics, vol. 52, no. 9, pp. 8976 – 8986, 2022.
- D. Huang, C.-D. Wang, and J.-H. Lai, “Fast multi-view clustering via ensembles: Towards scalability, superiority, and simplicity,” IEEE Transactions on Knowledge and Data Engineering, 2023.
- S. Liu, X. Liu, S. Wang, X. Niu, and E. Zhu, “Fast incomplete multi-view clustering with view-independent anchors,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
- A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp. 855–864.
- T. N. Kipf and M. Welling, “Variational graph auto-encoders,” NIPS Workshop on Bayesian Deep Learning, 2016.
- K. Hassani and A. H. Khasahmadi, “Contrastive multi-view representation learning on graphs,” in International Conference on Machine Learning. PMLR, 2020, pp. 4116–4126.
- S. Thakoor, C. Tallec, M. G. Azar, R. Munos, P. Veličković, and M. Valko, “Bootstrapped representation learning on graphs,” in ICLR 2021 Workshop on Geometrical and Topological Representation Learning, 2021.
- Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang, “Deep Graph Contrastive Representation Learning,” in ICML Workshop on Graph Representation Learning and Beyond, 2020.
- D. Bo, X. Wang, C. Shi, M. Zhu, E. Lu, and P. Cui, “Structural deep clustering network,” in Proceedings of The Web Conference 2020, 2020, pp. 1400–1410.
- W. Tu, S. Zhou, X. Liu, X. Guo, Z. Cai, E. Zhu, and J. Cheng, “Deep fusion clustering network,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 11, 2021, pp. 9978–9987.
- Y. Liu, W. Tu, S. Zhou, X. Liu, L. Song, X. Yang, and E. Zhu, “Deep graph clustering via dual correlation reduction,” in Proc. of AAAI, 2022.
- Z. Lin and Z. Kang, “Graph filter-based multi-view attributed graph clustering.” in IJCAI, 2021, pp. 2723–2729.
- M. Hamidouche, C. Lassance, Y. Hu, L. Drumetz, B. Pasdeloup, and V. Gripon, “Improving classification accuracy with graph filtering,” in 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 2021, pp. 334–338.
- S. Wang, X. Liu, X. Zhu, P. Zhang, Y. Zhang, F. Gao, and E. Zhu, “Fast parameter-free multi-view subspace clustering with consensus anchor guidance,” IEEE Transactions on Image Processing, vol. 31, pp. 556–568, 2021.
- X. Li, B. Kao, C. Shan, D. Yin, and M. Ester, “CAST: A correlation-based adaptive spectral clustering algorithm on multi-scale data,” in The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 2020, pp. 439–449.
- W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Advances in neural information processing systems, vol. 30, 2017.
- W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec, “Open graph benchmark: Datasets for machine learning on graphs,” Advances in neural information processing systems, vol. 33, pp. 22 118–22 133, 2020.
- F. M. Bianchi, D. Grattarola, and C. Alippi, “Spectral clustering with graph neural networks for graph pooling,” in International Conference on Machine Learning. PMLR, 2020, pp. 874–883.
- G. Karypis and V. Kumar, “Metis: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices,” 1997.
- P. Velickovic, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm, “Deep graph infomax.” ICLR, vol. 2, no. 3, p. 4, 2019.
- B. P. Anton Tsitsulin, John Palowitch and E. Müller, “Graph clustering with graph neural networks,” in Proceedings of the 16th International Workshop on Mining and Learning with Graphs (MLG), 2020.
- B. Jing, C. Park, and H. Tong, “Hdmi: High-order deep multiplex infomax,” in Proceedings of the Web Conference 2021, 2021, pp. 2414–2424.
- Y. Lin, Y. Gou, Z. Liu, B. Li, J. Lv, and X. Peng, “Completer: Incomplete multi-view clustering via contrastive prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 174–11 183.
- X. Xie, W. Chen, Z. Kang, and C. Peng, “Contrastive graph clustering with adaptive filter,” Expert Systems with Applications, vol. 219, p. 119645, 2023.
- H. Pei, B. Wei, K. C. Chang, Y. Lei, and B. Yang, “Geom-gcn: Geometric graph convolutional networks,” in 8th International Conference on Learning Representations, ICLR 2020,, 2020.
- B. Rozemberczki, C. Allen, and R. Sarkar, “Multi-scale attributed node embedding,” Journal of Complex Networks, vol. 9, no. 2, 2021.
- P. Zhu, J. Li, Y. Wang, B. Xiao, S. Zhao, and Q. Hu, “Collaborative decision-reinforced self-supervision for attributed graph clustering,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.