Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning-Augmented K-Means Clustering Using Dimensional Reduction (2401.03198v1)

Published 6 Jan 2024 in cs.LG

Abstract: Learning augmented is a machine learning concept built to improve the performance of a method or model, such as enhancing its ability to predict and generalize data or features, or testing the reliability of the method by introducing noise and other factors. On the other hand, clustering is a fundamental aspect of data analysis and has long been used to understand the structure of large datasets. Despite its long history, the k-means algorithm still faces challenges. One approach, as suggested by Ergun et al,is to use a predictor to minimize the sum of squared distances between each data point and a specified centroid. However, it is known that the computational cost of this algorithm increases with the value of k, and it often gets stuck in local minima. In response to these challenges, we propose a solution to reduce the dimensionality of the dataset using Principal Component Analysis (PCA). It is worth noting that when using k values of 10 and 25, the proposed algorithm yields lower cost results compared to running it without PCA. "Principal component analysis (PCA) is the problem of fitting a low-dimensional affine subspace to a set of data points in a high-dimensional space. PCA is well-established in the literature and has become one of the most useful tools for data modeling, compression, and visualization."

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. [n. d.]. KDD Cup Datasets. Online. Available at https://osmot.cs.cornell.edu/ kddcup/datasets.html.
  2. A. Abdulhafedh. 2021. Incorporating K-means, Hierarchical Clustering and PCA in Customer Segmentation. (2021).
  3. B. Bezděková. 2021. Using Principal Component Analysis to Characterize the Variability of VLF Wave Intensities Measured by a Low-Altitude Spacecraft and Caused by Interplanetary Shocks. Journal of Geophysical Research: Space Physics (2021).
  4. C.L. Epstein. 2008. Introduction to the Mathematics of Medical Imaging: Second Edition. Society for Industrial and Applied Mathematics. https://books.google. co.id/books?id=fErAEWU_sHUC
  5. A. Yovan Felix. 2019. K-Means Cluster Using Rainfall and Storm Prediction in Ma- chine Learning Technique. Journal of Computational and Theoretical Nanoscience (2019).
  6. Manoj Kumar Gupta. 2021. Effects of similarity/distance metrics on k-means algo- rithm with respect to its applications in IoT and multimedia: a review. Multimedia Tools and Applications (2021).
  7. Ian T Jolliffe. 2002. Principal component analysis for special types of data. Springer.
  8. Sang-Chul Kim. 2022. Efficient classification of human activity using PCA and deep learning LSTM with WiFi CSI. 2022 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) (2022).
  9. Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images.
  10. Huy Phan. 2023. PyTorch CIFAR-10 GitHub Repository. https://github.com/ huyvnphan/PyTorch_CIFAR10.
  11. R. Saraswat. 2021. Assessment of Water Quality of Khari River in Agra District During Lockdown Period using Multivariant Techniques and Quality Indexes. Asian Journal of Chemistry (2021).
  12. Bingling Wang. 2021. K-expectiles clustering. J. Multivar. Anal. (2021).
  13. D. Williamson. 2020. Machine learning for cluster analysis of localization mi- croscopy data. Nature Communications (2020).

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com