Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GriT-DBSCAN: A Spatial Clustering Algorithm for Very Large Databases (2210.07580v2)

Published 14 Oct 2022 in cs.DB and cs.DS

Abstract: DBSCAN is a fundamental spatial clustering algorithm with numerous practical applications. However, a bottleneck of the algorithm is in the worst case, the run time complexity is $O(n2)$. To address this limitation, we propose a new grid-based algorithm for exact DBSCAN in Euclidean space called GriT-DBSCAN, which is based on the following two techniques. First, we introduce a grid tree to organize the non-empty grids for the purpose of efficient non-empty neighboring grids queries. Second, by utilising the spatial relationships among points, we propose a technique that iteratively prunes unnecessary distance calculations when determining whether the minimum distance between two sets is less than or equal to a certain threshold. We theoretically prove that the complexity of GriT-DBSCAN is linear to the data set size. In addition, we obtain two variants of GriT-DBSCAN by incorporating heuristics, or by combining the second technique with an existing algorithm. Experiments are conducted on both synthetic and real-world data sets to evaluate the efficiency of GriT-DBSCAN and its variants. The results of our analyses show that our algorithms outperform existing algorithms.

Citations (16)

Summary

We haven't generated a summary for this paper yet.