Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Clustering of Tabular Data by Weighted Gaussian Distribution Learning (2301.00802v3)

Published 2 Jan 2023 in cs.LG and cs.AI

Abstract: Deep learning methods are primarily proposed for supervised learning of images or text with limited applications to clustering problems. In contrast, tabular data with heterogeneous features pose unique challenges in representation learning, where deep learning has yet to replace traditional machine learning. This paper addresses these challenges in developing one of the first deep clustering methods for tabular data: Gaussian Cluster Embedding in Autoencoder Latent Space (G-CEALS). G-CEALS is an unsupervised deep clustering framework for learning the parameters of multivariate Gaussian cluster distributions by iteratively updating individual cluster weights. The G-CEALS method presents average rank orderings of 2.9(1.7) and 2.8(1.7) based on clustering accuracy and adjusted Rand index (ARI) scores on sixteen tabular data sets, respectively, and outperforms nine state-of-the-art clustering methods. G-CEALS substantially improves clustering performance compared to traditional K-means and GMM, which are still de facto methods for clustering tabular data. Similar computationally efficient and high-performing deep clustering frameworks are imperative to reap the myriad benefits of deep learning on tabular data over traditional machine learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Hpt-rl: Calibrating power system models based on hierarchical parameter tuning and reinforcement learning. In 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 1231–1237. IEEE, 2020.
  2. Survey on Deep Neural Networks in Speech and Vision Systems. Neurocomputing, 417:302–321, 2020.
  3. Xue Ying. An overview of overfitting and its solutions. In Journal of physics: Conference series, volume 1168, page 022022. IOP Publishing, 2019.
  4. Adversarial attacks on neural network policies. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. OpenReview.net, 2017.
  5. Joint optimization of an autoencoder for clustering and embedding. Machine learning, 110(7):1901–1937, 2021.
  6. Deep clustering for unsupervised learning of visual features. In Proceedings of the European conference on computer vision (ECCV), pages 132–149, 2018.
  7. Attention versus contrastive learning of tabular data – a data-centric benchmarking, 2024.
  8. Unsupervised deep embedding for clustering analysis. In 33rd International Conference on Machine Learning, ICML 2016, volume 1, pages 740–749, 2016.
  9. Improved deep embedded clustering with local structure preservation. IJCAI International Joint Conference on Artificial Intelligence, 0:1753–1759, 2017.
  10. Deep k-means: Jointly clustering with k-means and learning representations. Pattern Recognition Letters, 138:185–192, 2020.
  11. Deep clustering with a dynamic autoencoder: From reconstruction towards centroids construction. Neural Networks, 130:206–228, oct 2020.
  12. Towards K-means-friendly spaces: Simultaneous deep learning and clustering. In 34th International Conference on Machine Learning, ICML 2017, volume 8, pages 5888–5901, 2017.
  13. Deep learning does not outperform classical machine learning for cell-type annotation. bioRxiv, page 653907, 2019.
  14. Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data. BMC Bioinformatics, 21(1):119, dec 2020.
  15. Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  16. Well-tuned simple nets excel on tabular datasets. Advances in neural information processing systems, 34:23928–23941, 2021.
  17. Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90, may 2022.
  18. A comprehensive survey on deep clustering: Taxonomy, challenges, and future directions. arXiv preprint arXiv:2206.07579, 2022.
  19. Laurens Van Der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579–2625, 2008.
  20. Improving spectral clustering with deep embedding and cluster estimation. Proceedings - IEEE International Conference on Data Mining, ICDM, 2019-Novem(Icdm):170–179, 2019.
  21. JECL: Joint embedding and cluster learning for image-text pairs. Proceedings - International Conference on Pattern Recognition, pages 8344–8351, 2020.
  22. Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization. In Proceedings of the IEEE International Conference on Computer Vision, volume 2017-Octob, pages 5747–5756, 2017.
  23. Robust embedded deep K-means clustering. International Conference on Information and Knowledge Management, Proceedings, pages 1181–1190, 2019.
  24. Semi-supervised deep embedded clustering. Neurocomputing, 325:121–130, jan 2019.
  25. Semi-supervised learning with deep embedded clustering for image classification and segmentation. IEEE Access, 7:11093–11104, 2019.
  26. Deep Clustering and Visualization for End-to-End High-Dimensional Data Analysis. IEEE Transactions on Neural Networks and Learning Systems, pages 1–12, 2022.
  27. Idecf: Improved deep embedding clustering with deep fuzzy supervision. In 2021 IEEE International Conference on Image Processing (ICIP), pages 1009–1013. IEEE, 2021.
  28. Scan: Learning to classify images without labels. In European conference on computer vision, pages 268–285. Springer, 2020.
  29. Contrastive clustering. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 8547–8555, 2021.
  30. Darc: Deep adaptive regularized clustering for histopathological image classification. Medical image analysis, 80:102521, 2022.
  31. C3: Cross-instance guided contrastive clustering. British Machine Vision Conference 2023, 2023.
  32. Learning semi-supervised gaussian mixture models for generalized category discovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16623–16633, 2023.
  33. Toward balance deep semisupervised clustering. IEEE Transactions on Neural Networks and Learning Systems, pages 1–13, 2024.
  34. Deep clustering with self-supervision using pairwise similarities. arXiv preprint arXiv:2405.03590, 2024.
  35. Deep imputation of missing values in time series health data: A review with benchmarking. Journal of Biomedical Informatics, page 104440, 2023.
  36. Deep clustering of electronic health records tabular data for clinical interpretation. In 2023 IEEE International Conference on Telecommunications and Photonics (ICTP), pages 01–05. IEEE, 2023.
  37. Missing value estimation using clustering and deep learning within multiple imputation framework. Knowledge-based systems, 249:108968, 2022.
  38. Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 6679–6687, 2021.
  39. Self-attention between datapoints: Going beyond individual input-output pairs in deep learning. Advances in Neural Information Processing Systems, 34:28742–28756, 6 2021.
  40. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems, 23:18932–18943, 2021.
  41. UCI machine learning repository, 2017.
  42. Non-transfer Deep Learning of Optical Coherence Tomography for Post-hoc Explanation of Macular Disease Classification. In 2021 IEEE 9th International Conference on Healthcare Informatics (ICHI), pages 48–52. IEEE, aug 2021.
  43. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In International Conference on Learning Representations, 2018.
  44. A gaussian mixture model layer jointly optimized with discriminative features within a deep neural network architecture. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4270–4274. IEEE, 2015.
  45. Perturbation of deep autoencoder weights for model compression and classification of tabular data. Neural Networks, 156:160–169, 2022.
  46. Openml benchmarking suites. arXiv:1708.03731v2 [stat.ML], 2019.
  47. Deep Clustering: A Comprehensive Survey. oct 2022.
  48. Deep subspace clustering to achieve jointly latent feature extraction and discriminative learning. Neurocomputing, 404:340–350, 2020.
  49. H. W. Kuhn. The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2:83–97, 3 1955.
  50. Effectiveness of deep image embedding clustering methods on tabular data. In 2023 15th International Conference on Advanced Computational Intelligence (ICACI), pages 1–7, 2023.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets