PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU (2401.06089v1)
Abstract: This paper presents \pandora, a novel parallel algorithm for efficiently constructing dendrograms for single-linkage hierarchical clustering, including \hdbscan. Traditional dendrogram construction methods from a minimum spanning tree (MST), such as agglomerative or divisive techniques, often fail to efficiently parallelize, especially with skewed dendrograms common in real-world data. \pandora addresses these challenges through a unique recursive tree contraction method, which simplifies the tree for initial dendrogram construction and then progressively reconstructs the complete dendrogram. This process makes \pandora asymptotically work-optimal, independent of dendrogram skewness. All steps in \pandora are fully parallel and suitable for massively threaded accelerators such as GPUs. Our implementation is written in Kokkos, providing support for both CPUs and multi-vendor GPUs (e.g., Nvidia, AMD). The multithreaded version of \pandora is 2.2$\times$ faster than the current best-multithreaded implementation, while the GPU \pandora implementation achieved 6-20$\times$ on \amdgpu and 10-37$\times$ on \nvidiagpu speed-up over multithreaded \pandora. These advancements lead to up to a 6-fold speedup for \hdbscan on GPUs over the current best, which only offload MST construction to GPUs and perform multithreaded dendrogram construction.
- (2018). Next generation simulation (NGSIM) vehicle trajectories and supporting data. Available online: https://catalog.data.gov/dataset/next-generation-simulation-ngsim-vehicle-trajectories-and-supporting-data. Accessed: 2021-03-06.
- (2024). IKONOS Satellite Image of Tadco Farms, Saudi Arabia. https://www.satimagingcorp.com/gallery/ikonos/ikonos-tadco-farms-saudi-arabia. Accessed: 2024-01-01.
- UCI machine learning repository.
- Bentley and Friedman (1978). Fast Algorithms for Constructing Minimal Spanning Trees in Coordinate Spaces. IEEE Transactions on Computers, C-27(2):97ā105. Conference Name: IEEE Transactions on Computers.
- Internally deterministic parallel algorithms can be fast. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP ā12, pages 181ā192, New York, NY, USA. Association for Computing Machinery.
- Max-tree computation on gpus. IEEE Transactions on Parallel and Distributed Systems, 33(12):3520ā3531.
- BorÅÆvka, O. (1926). O jistĆ©m problĆ©mu minimĆ”lnĆm. PrĆ”ce Mor. Prırodved. Spol. v Brne (Acta Societ. Scienc. Natur. Moravicae), 3(3):37ā58.
- Partition and inclusion hierarchies of images: A comprehensive survey. Journal of Imaging, 4(2):33.
- Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data, 10(1):5:1ā5:51.
- Finding structure in linguistic data. Corpus methods for semantics: Quantitative studies in polysemy and synonymy, pages 405ā441.
- Farris, J.Ā S. (1972). Estimating phylogenetic trees from distance matrices. The American Naturalist, 106(951):645ā668.
- Feigelson, E. (2012). Classification in astronomy. Advances in Machine Learning and Data Mining for Astronomy, pageĀ 1.
- Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363:8ā23.
- On the Hardness and Approximation of Euclidean DBSCAN. ACM Transactions on Database Systems, 42(3):14:1ā14:45.
- Minimum spanning trees and single linkage cluster analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics), 18(1):54ā64.
- HACC: Simulating sky surveys on state-of-the-art supercomputing architectures. New Astronomy, 42:49ā65.
- dbscan: Fast density-based clustering with R. Journal of Statistical Software, 91(1):1ā30.
- Efficient schemes for computing αš¼\alphaitalic_α-tree representations. In Mathematical Morphology and Its Applications to Signal and Image Processing: 11th International Symposium, ISMM 2013, Uppsala, Sweden, May 27-29, 2013. Proceedings 11, pages 111ā122. Springer.
- Efficient tree construction for multiscale image representation and processing. Journal of Real-Time Image Processing, 16:1129ā1146.
- A myosin family tree. Journal of cell science, 113(19):3353ā3354.
- Using rapids ai to accelerate graph data science workflows. In 2020 IEEE High Performance Extreme Computing Conference (HPEC), pages 1ā4. IEEE.
- A High-performance Connected Components Implementation for GPUs. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, HPDC ā18, pages 92ā104, New York, NY, USA. ACM.
- An efficient minimum spanning tree based clustering algorithm. In 2009 Proceeding of International Conference on Methods and Models in Computer Science (ICM2CS), pages 1ā5. IEEE.
- Jarman, A.Ā M. (2020). Hierarchical cluster analysis: Comparison of single linkage, complete linkage, average linkage and centroid linkage method. Georgia Southern University, 29.
- Building accurate 3d spatial networks to enable next generation intelligent transportation systems. In 2013 IEEE 14th International Conference on Mobile Data Management, volumeĀ 1, pages 137ā146. IEEE.
- Kruskal, J.Ā B. (1956). On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. Proceedings of the American Mathematical Society, 7(1):48ā50. Publisher: American Mathematical Society.
- Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering, 17(7):902ā911.
- ArborX: A performance portable geometric search library. ACM Trans. Math. Softw., 47(1).
- Interactive tree of life (itol): an online tool for phylogenetic tree display and annotation. Bioinformatics, 23(1):127ā128.
- Classification of text documents. The Computer Journal, 41(8):537ā546.
- Spatial-social network visualization for exploratory data analysis. In Proceedings of the 3rd ACM SIGSPATIAL international workshop on location-based social networks, pages 65ā68.
- hdbscan: Hierarchical density based clustering. Journal of Open Source Software, 2(11):205.
- Parallel tree contraction and its application. In FOCS, volumeĀ 26, pages 478ā489.
- Comparative analyses of gene co-expression networks: Implementations and applications in the study of evolution. Frontiers in Genetics, 12:695399.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825ā2830.
- Pfeifer, S. (2020). The molecular evolutionary clock. theory and practice.
- Euler Meets GPU: Practical Graph Algorithms with Theoretical Guarantees. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 233ā244. ISSN: 1530-2075.
- Prim, R.Ā C. (1957). Shortest connection networks and some generalizations. The Bell System Technical Journal, 36(6):1389ā1401. Conference Name: The Bell System Technical Journal.
- A single-tree algorithm to compute the Euclidean minimum spanning tree on GPUs. In Proceedings of the 51st International Conference on Parallel Processing, ICPP ā22, pages 1ā10, New York, NY, USA. Association for Computing Machinery.
- Introducing a new benchmarked dataset for activity monitoring. In 2012 16th international symposium on wearable computers, pages 108ā109. IEEE.
- Sibson, R. (1973). Slink: an optimally efficient algorithm for the single-link cluster method. The computer journal, 16(1):30ā34.
- Soille, P. (2008). Constrained connectivity for hierarchical image partitioning and simplification. IEEE transactions on pattern analysis and machine intelligence, 30(7):1132ā1145.
- Worst-case analysis of set union algorithms. Journal of the ACM (JACM), 31(2):245ā281.
- Kokkos 3: programming model extensions for the exascale era. IEEE Transactions on Parallel and Distributed Systems, 33(4):805ā817. Conference Name: IEEE Transactions on Parallel and Distributed Systems.
- Texture classification: Are filter banks necessary? In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., volumeĀ 2, pages IIā691. IEEE.
- Fast parallel algorithms for Euclidean minimum spanning tree and hierarchical spatial clustering. In Proceedings of the 2021 International Conference on Management of Data, SIGMOD/PODS ā21, pages 1982ā1995. Association for Computing Machinery.
- WardĀ Jr, J.Ā H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236ā244.
- A segmentation algorithm for noisy images. In Proceedings IEEE International Joint Symposia on Intelligence and Systems, pages 220ā226. IEEE.
- Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics, 18(4):536ā545.
- Hierarchical cluster analysis: comparison of three linkage measures and application to psychological data. The quantitative methods for psychology, 11(1):8ā21.
- Zahn, C.Ā T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on computers, 100(1):68ā86.
- Evolutionary divergence and convergence in proteins. In Evolving genes and proteins, pages 97ā166. Elsevier.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.