Papers
Topics
Authors
Recent
2000 character limit reached

PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU (2401.06089v1)

Published 11 Jan 2024 in cs.LG and cs.DC

Abstract: This paper presents \pandora, a novel parallel algorithm for efficiently constructing dendrograms for single-linkage hierarchical clustering, including \hdbscan. Traditional dendrogram construction methods from a minimum spanning tree (MST), such as agglomerative or divisive techniques, often fail to efficiently parallelize, especially with skewed dendrograms common in real-world data. \pandora addresses these challenges through a unique recursive tree contraction method, which simplifies the tree for initial dendrogram construction and then progressively reconstructs the complete dendrogram. This process makes \pandora asymptotically work-optimal, independent of dendrogram skewness. All steps in \pandora are fully parallel and suitable for massively threaded accelerators such as GPUs. Our implementation is written in Kokkos, providing support for both CPUs and multi-vendor GPUs (e.g., Nvidia, AMD). The multithreaded version of \pandora is 2.2$\times$ faster than the current best-multithreaded implementation, while the GPU \pandora implementation achieved 6-20$\times$ on \amdgpu and 10-37$\times$ on \nvidiagpu speed-up over multithreaded \pandora. These advancements lead to up to a 6-fold speedup for \hdbscan on GPUs over the current best, which only offload MST construction to GPUs and perform multithreaded dendrogram construction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. (2018). Next generation simulation (NGSIM) vehicle trajectories and supporting data. Available online: https://catalog.data.gov/dataset/next-generation-simulation-ngsim-vehicle-trajectories-and-supporting-data. Accessed: 2021-03-06.
  2. (2024). IKONOS Satellite Image of Tadco Farms, Saudi Arabia. https://www.satimagingcorp.com/gallery/ikonos/ikonos-tadco-farms-saudi-arabia. Accessed: 2024-01-01.
  3. UCI machine learning repository.
  4. Bentley and Friedman (1978). Fast Algorithms for Constructing Minimal Spanning Trees in Coordinate Spaces. IEEE Transactions on Computers, C-27(2):97–105. Conference Name: IEEE Transactions on Computers.
  5. Internally deterministic parallel algorithms can be fast. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP ’12, pages 181–192, New York, NY, USA. Association for Computing Machinery.
  6. Max-tree computation on gpus. IEEE Transactions on Parallel and Distributed Systems, 33(12):3520–3531.
  7. BorÅÆvka, O. (1926). O jistĆ©m problĆ©mu minimĆ”lnĆ­m. PrĆ”ce Mor. Prırodved. Spol. v Brne (Acta Societ. Scienc. Natur. Moravicae), 3(3):37–58.
  8. Partition and inclusion hierarchies of images: A comprehensive survey. Journal of Imaging, 4(2):33.
  9. Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data, 10(1):5:1–5:51.
  10. Finding structure in linguistic data. Corpus methods for semantics: Quantitative studies in polysemy and synonymy, pages 405–441.
  11. Farris, J.Ā S. (1972). Estimating phylogenetic trees from distance matrices. The American Naturalist, 106(951):645–668.
  12. Feigelson, E. (2012). Classification in astronomy. Advances in Machine Learning and Data Mining for Astronomy, pageĀ 1.
  13. Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363:8–23.
  14. On the Hardness and Approximation of Euclidean DBSCAN. ACM Transactions on Database Systems, 42(3):14:1–14:45.
  15. Minimum spanning trees and single linkage cluster analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics), 18(1):54–64.
  16. HACC: Simulating sky surveys on state-of-the-art supercomputing architectures. New Astronomy, 42:49–65.
  17. dbscan: Fast density-based clustering with R. Journal of Statistical Software, 91(1):1–30.
  18. Efficient schemes for computing Ī±š›¼\alphaitalic_α-tree representations. In Mathematical Morphology and Its Applications to Signal and Image Processing: 11th International Symposium, ISMM 2013, Uppsala, Sweden, May 27-29, 2013. Proceedings 11, pages 111–122. Springer.
  19. Efficient tree construction for multiscale image representation and processing. Journal of Real-Time Image Processing, 16:1129–1146.
  20. A myosin family tree. Journal of cell science, 113(19):3353–3354.
  21. Using rapids ai to accelerate graph data science workflows. In 2020 IEEE High Performance Extreme Computing Conference (HPEC), pages 1–4. IEEE.
  22. A High-performance Connected Components Implementation for GPUs. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’18, pages 92–104, New York, NY, USA. ACM.
  23. An efficient minimum spanning tree based clustering algorithm. In 2009 Proceeding of International Conference on Methods and Models in Computer Science (ICM2CS), pages 1–5. IEEE.
  24. Jarman, A.Ā M. (2020). Hierarchical cluster analysis: Comparison of single linkage, complete linkage, average linkage and centroid linkage method. Georgia Southern University, 29.
  25. Building accurate 3d spatial networks to enable next generation intelligent transportation systems. In 2013 IEEE 14th International Conference on Mobile Data Management, volumeĀ 1, pages 137–146. IEEE.
  26. Kruskal, J.Ā B. (1956). On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. Proceedings of the American Mathematical Society, 7(1):48–50. Publisher: American Mathematical Society.
  27. Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering, 17(7):902–911.
  28. ArborX: A performance portable geometric search library. ACM Trans. Math. Softw., 47(1).
  29. Interactive tree of life (itol): an online tool for phylogenetic tree display and annotation. Bioinformatics, 23(1):127–128.
  30. Classification of text documents. The Computer Journal, 41(8):537–546.
  31. Spatial-social network visualization for exploratory data analysis. In Proceedings of the 3rd ACM SIGSPATIAL international workshop on location-based social networks, pages 65–68.
  32. hdbscan: Hierarchical density based clustering. Journal of Open Source Software, 2(11):205.
  33. Parallel tree contraction and its application. In FOCS, volumeĀ 26, pages 478–489.
  34. Comparative analyses of gene co-expression networks: Implementations and applications in the study of evolution. Frontiers in Genetics, 12:695399.
  35. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
  36. Pfeifer, S. (2020). The molecular evolutionary clock. theory and practice.
  37. Euler Meets GPU: Practical Graph Algorithms with Theoretical Guarantees. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 233–244. ISSN: 1530-2075.
  38. Prim, R.Ā C. (1957). Shortest connection networks and some generalizations. The Bell System Technical Journal, 36(6):1389–1401. Conference Name: The Bell System Technical Journal.
  39. A single-tree algorithm to compute the Euclidean minimum spanning tree on GPUs. In Proceedings of the 51st International Conference on Parallel Processing, ICPP ’22, pages 1–10, New York, NY, USA. Association for Computing Machinery.
  40. Introducing a new benchmarked dataset for activity monitoring. In 2012 16th international symposium on wearable computers, pages 108–109. IEEE.
  41. Sibson, R. (1973). Slink: an optimally efficient algorithm for the single-link cluster method. The computer journal, 16(1):30–34.
  42. Soille, P. (2008). Constrained connectivity for hierarchical image partitioning and simplification. IEEE transactions on pattern analysis and machine intelligence, 30(7):1132–1145.
  43. Worst-case analysis of set union algorithms. Journal of the ACM (JACM), 31(2):245–281.
  44. Kokkos 3: programming model extensions for the exascale era. IEEE Transactions on Parallel and Distributed Systems, 33(4):805–817. Conference Name: IEEE Transactions on Parallel and Distributed Systems.
  45. Texture classification: Are filter banks necessary? In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., volumeĀ 2, pages II–691. IEEE.
  46. Fast parallel algorithms for Euclidean minimum spanning tree and hierarchical spatial clustering. In Proceedings of the 2021 International Conference on Management of Data, SIGMOD/PODS ’21, pages 1982–1995. Association for Computing Machinery.
  47. WardĀ Jr, J.Ā H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244.
  48. A segmentation algorithm for noisy images. In Proceedings IEEE International Joint Symposia on Intelligence and Systems, pages 220–226. IEEE.
  49. Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics, 18(4):536–545.
  50. Hierarchical cluster analysis: comparison of three linkage measures and application to psychological data. The quantitative methods for psychology, 11(1):8–21.
  51. Zahn, C.Ā T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on computers, 100(1):68–86.
  52. Evolutionary divergence and convergence in proteins. In Evolving genes and proteins, pages 97–166. Elsevier.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.