Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Online Clustering by Penalized Weighted GMM (1902.02544v1)

Published 7 Feb 2019 in cs.LG, cs.CV, and stat.ML

Abstract: With the dawn of the Big Data era, data sets are growing rapidly. Data is streaming from everywhere - from cameras, mobile phones, cars, and other electronic devices. Clustering streaming data is a very challenging problem. Unlike the traditional clustering algorithms where the dataset can be stored and scanned multiple times, clustering streaming data has to satisfy constraints such as limit memory size, real-time response, unknown data statistics and an unknown number of clusters. In this paper, we present a novel online clustering algorithm which can be used to cluster streaming data without knowing the number of clusters a priori. Results on both synthetic and real datasets show that the proposed algorithm produces partitions which are close to what you could get if you clustered the whole data at one time.

Citations (2)

Summary

We haven't generated a summary for this paper yet.