Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Accelerating Biclique Counting on GPU (2403.07858v2)

Published 12 Mar 2024 in cs.DC

Abstract: Counting (p,q)-bicliques in bipartite graphs poses a foundational challenge with broad applications, from densest subgraph discovery in algorithmic research to personalized content recommendation in practical scenarios. Despite its significance, current leading (p,q)-biclique counting algorithms fall short, particularly when faced with larger graph sizes and clique scales. Fortunately, the problem's inherent structure, allowing for the independent counting of each biclique starting from every vertex, combined with a substantial set intersections, makes it highly amenable to parallelization. Recent successes in GPU-accelerated algorithms across various domains motivate our exploration into harnessing the parallelism power of GPUs to efficiently address the (p,q)-biclique counting challenge. We introduce GBC (GPU-based Biclique Counting), a novel approach designed to enable efficient and scalable (p,q)-biclique counting on GPUs. To address major bottleneck arising from redundant comparisons in set intersections (occupying an average of 90% of the runtime), we introduce a novel data structure that hashes adjacency lists into truncated bitmaps to enable efficient set intersection on GPUs via bit-wise AND operations. Our innovative hybrid DFS-BFS exploration strategy further enhances thread utilization and effectively manages memory constraints. A composite load balancing strategy, integrating pre-runtime and runtime workload allocation, ensures equitable distribution among threads. Additionally, we employ vertex reordering and graph partitioning strategies for improved compactness and scalability. Experimental evaluations on eight real-life and two synthetic datasets demonstrate that GBC outperforms state-of-the-art algorithms by a substantial margin. In particular, GBC achieves an average speedup of 497.8x, with the largest instance achieving a remarkable 1217.7x speedup when p = q = 8.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Full version. https://github.com/ZJU-DAILY/GBC/blob/main/GBC.pdf.
  2. Accelerating k-core decomposition by a gpu. In ICDE, pages 1818–1831, 2023.
  3. Update on k-truss decomposition on gpu. In HPEC, pages 1–7, 2019.
  4. Parallelizing maximal clique enumeration on gpus. arXiv preprint arXiv:2212.01473, 2022.
  5. Parallel k-clique counting on gpus. In ICS, pages 1–14, 2022.
  6. K. Andreev and H. Räcke. Balanced graph partitioning. In SPAA, pages 120–124, 2004.
  7. Efficient parallel lists intersection and index compression algorithms using graphics processing units. PVLDB, 4(8):470–481, 2011.
  8. Rabbit order: Just-in-time parallel reordering for fast graph analysis. In IPDPS, pages 22–31, 2016.
  9. M. Bisson and M. Fatica. High performance exact triangle counting on gpus. TPDS, 28(12):3501–3510, 2017.
  10. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW, pages 587–596, 2011.
  11. Network analysis of 2-mode data. Social networks, 19(3):243–269, 1997.
  12. Efficient temporal butterfly counting and enumeration on temporal bipartite graphs. arXiv preprint arXiv:2306.00893, 2023.
  13. Accelerating truss decomposition on heterogeneous processors. PVLDB, 13(10):1751–1764, 2020.
  14. Efficient maximal biclique enumeration for large sparse bipartite graphs. PVLDB, 15(8):1559–1571, 2022.
  15. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. TOPC, 5(3):1–39, 2019.
  16. Pangolin: An efficient and flexible graph mining system on cpu and gpu. PVLDB, 13(8):1190–1205, 2020.
  17. Ktrussexplorer: Exploring the design space of k-truss decomposition optimizations on gpus. In HPEC, pages 1–8, 2020.
  18. Logarithmic radix binning and vectorized triangle counting. In HPEC, pages 1–7, 2018.
  19. Gpu-accelerated subgraph enumeration on partitioned graphs. In SIGMOD, pages 1067–1082, 2020.
  20. Triangle counting on gpu using fine-grained task distribution. In ICDEW, pages 225–232, 2019.
  21. Accelerating triangle counting on GPU. In SIGMOD, pages 736–748, 2021.
  22. Tricore: Parallel triangle counting on gpus. In SC, pages 171–182, 2018.
  23. t⁢c−s⁢t⁢r⁢e⁢a⁢m𝑡𝑐𝑠𝑡𝑟𝑒𝑎𝑚tc-streamitalic_t italic_c - italic_s italic_t italic_r italic_e italic_a italic_m: Large-scale graph triangle counting on a single machine using gpus. TPDS, 33(11):3067–3078, 2021.
  24. G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput, page 359–392, 1998.
  25. P. Kumar and R. Kumar. Issues and challenges of load balancing techniques in cloud computing: A survey. CSUR, 51(6):1–35, 2019.
  26. Maximal biclique subgraphs and closed pattern pairs of the adjacency matrix: A one-to-one correspondence and mining algorithms. TKDE, 19(12):1625–1637, 2007.
  27. Hierarchical bipartite graph neural networks: Towards large-scale e-commerce applications. In ICDE, pages 1677–1688, 2020.
  28. Network motif discovery: A gpu approach. TKDE, 29(3):513–528, 2016.
  29. Efficient (α𝛼\alphaitalic_α, β𝛽\betaitalic_β)-core computation: An index-based approach. In WWW, pages 1130–1141, 2019.
  30. H. Liu and H. H. Huang. Enterprise: Breadth-first graph traversal on gpus. In SC, pages 1–12, 2015.
  31. Accelerating exact constrained shortest paths on gpus. PVLDB, 14(4):547–559, 2020.
  32. Vectorising k-core decomposition for gpu acceleration. In SSDBM, pages 1–4, 2020.
  33. Scalable gpu graph traversal. ACM Sigplan Notices, 47(8):117–128, 2012.
  34. Scalable large near-clique detection in large-scale networks via sampling. In SIGKDD, pages 815–824, 2015.
  35. Trust: Triangle counting reloaded on gpus. TPDS, 32(11):2646–2660, 2021.
  36. A. Polak. Counting triangles in large graphs on GPU. In IPDPS Workshops, pages 740–746, 2016.
  37. Butterfly counting in bipartite networks. In SIGKDD, pages 2150–2159, 2018.
  38. Fleet: Butterfly estimation from a bipartite graph stream. In SIGMOD, pages 1201–1210, 2019.
  39. Self-adaptive graph traversal on gpus. In SIGMOD, pages 1558–1570, 2021.
  40. Graphjet: Real-time content recommendations at twitter. PVLDB, 9(13):1281–1292, 2016.
  41. A. Sheshbolouki and M. T. Özsu. sgrapp: Butterfly approximation in streaming graphs. TKDD, 16(4):1–43, 2022.
  42. Realtime top-k personalized pagerank over large graphs on gpus. PVLDB, 13(1):15–28, 2019.
  43. Maximal balanced signed biclique enumeration in signed bipartite graphs. In ICDE, pages 1887–1899, 2022.
  44. Scalable k-core decomposition for static graphs using a dynamic graph data structure. In BigData, pages 1134–1141, 2018.
  45. Sep-graph: finding shortest execution paths for graph processing under a hybrid framework on gpu. In PPoPP, pages 38–52, 2019.
  46. Rectangle counting in large bipartite graphs. In BigData, pages 17–24, 2014.
  47. Efficient maximal biclique enumeration on large uncertain bipartite graphs. TKDE, 2023.
  48. Efficient bitruss decomposition for large-scale bipartite graphs. In ICDE, pages 661–672, 2020.
  49. Efficient and effective community search on large-scale bipartite graphs. In ICDE, pages 85–96, 2021.
  50. Gunrock: A high-performance graph processing library on the gpu. In PPoPP, pages 1–12, 2016.
  51. Speedup graph processing by graph ordering. In SIGMOD, pages 1813–1828, 2016.
  52. Accelerating the bron-kerbosch algorithm for maximal clique enumeration using gpus. TPDS, 32(9):2352–2366, 2021.
  53. Efficient load-balanced butterfly counting on gpu. PVLDB, 15(11):2450–2462, 2022.
  54. (p, q)-biclique counting and enumeration for large sparse bipartite graphs. PVLDB, 15(2):141–153, 2021.
  55. Scalable and effective bipartite network embedding. In SIGMOD, pages 1977–1991, 2022.
  56. Fairness-aware maximal biclique enumeration on bipartite graphs. arXiv preprint arXiv:2303.03705, 2023.

Summary

We haven't generated a summary for this paper yet.