Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Play like a Vertex: A Stackelberg Game Approach for Streaming Graph Partitioning (2402.18304v1)

Published 28 Feb 2024 in cs.DC, cs.DB, and cs.GT

Abstract: In the realm of distributed systems tasked with managing and processing large-scale graph-structured data, optimizing graph partitioning stands as a pivotal challenge. The primary goal is to minimize communication overhead and runtime cost. However, alongside the computational complexity associated with optimal graph partitioning, a critical factor to consider is memory overhead. Real-world graphs often reach colossal sizes, making it impractical and economically unviable to load the entire graph into memory for partitioning. This is also a fundamental premise in distributed graph processing, where accommodating a graph with non-distributed systems is unattainable. Currently, existing streaming partitioning algorithms exhibit a skew-oblivious nature, yielding satisfactory partitioning results exclusively for specific graph types. In this paper, we propose a novel streaming partitioning algorithm, the Skewness-aware Vertex-cut Partitioner S5P, designed to leverage the skewness characteristics of real graphs for achieving high-quality partitioning. S5P offers high partitioning quality by segregating the graph's edge set into two subsets, head and tail sets. Following processing by a skewness-aware clustering algorithm, these two subsets subsequently undergo a Stackelberg graph game. Our extensive evaluations conducted on substantial real-world and synthetic graphs demonstrate that, in all instances, the partitioning quality of S5P surpasses that of existing streaming partitioning algorithms, operating within the same load balance constraints. For example, S5P can bring up to a 51% improvement in partitioning quality compared to the top partitioner among the baselines. Lastly, we showcase that the implementation of S5P results in up to an 81% reduction in communication cost and a 130% increase in runtime efficiency for distributed graph processing tasks on PowerGraph.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Error and attack tolerance of complex networks. nature 406, 6794 (2000), 378–382. https://doi.org/10.1038/35019019
  2. Real-Time Multi-Criteria Social Graph Partitioning: A Game Theoretic Approach. In SIGMOD. ACM, New York, NY, USA, 1617–1628. https://doi.org/10.1145/2723372.2749450
  3. BUbiNG: Massive crawling for the masses. TWEB (2018), 1–26. https://doi.org/10.1145/3160017
  4. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW. 587–596. https://doi.org/10.1145/1963405.1963488
  5. Paolo Boldi and Sebastiano Vigna. 2004. The webgraph framework I: compression techniques. In WWW. 595–602. https://doi.org/10.1145/988672.988752
  6. Balanced Graph Edge Partition. In SIGKDD (New York, New York, USA). Association for Computing Machinery, New York, NY, USA, 1456–1465. https://doi.org/10.1145/2623330.2623660
  7. R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 442–446. https://epubs.siam.org/doi/abs/10.1137/1.9781611972740.43
  8. PowerLyra: differentiated graph computation and partitioning on skewed graphs. In EuroSys. ACM, 1:1–1:15. https://doi.org/10.1145/3298989
  9. Scale-free graph with preferential attachment and evolving internal vertex structure. Journal of Statistical Physics 151 (2013), 1175–1183. https://doi.org/10.1007/s10955-013-0749-1
  10. Robert J Cimikowski. 1992. Graph planarization and skewness. Congressus Numerantium (1992), 21–21. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=98060fa1da8eb732b8095e6f731ae387671d9ebb
  11. Breakdown of the internet under intentional attack. Physical review letters 86, 16 (2001), 3682. https://doi.org/10.1103/PhysRevLett.86.3682
  12. Graham Cormode and S. Muthukrishnan. 2005. An Improved Data Stream Summary: The Count-Min Sketch and Its Applications. J. Algorithms 55, 1 (2005), 58–75. https://doi.org/10.1016/j.jalgor.2003.12.001
  13. David P. Doane and Lori E. Seward. 2011. Measuring Skewness: A Forgotten Statistic? Journal of Statistics Education 19, 2 (2011). https://doi.org/10.1080/10691898.2011.11889611
  14. Richard Durrett. 2007. Random graph dynamics. Vol. 200. Cambridge university press Cambridge. https://services.math.duke.edu/~rtd/RGD/RGD.pdf
  15. Federico Etro. 2008. Stackelberg Competition with Endogenous Entry. The Economic Journal 118, 532 (09 2008), 1670–1697. https://doi.org/10.1111/j.1468-0297.2008.02185.x
  16. Casting Light on the Hidden Bilevel Combinatorial Structure of the Capacitated Vertex Separator Problem. Operations Research (2021). https://api.semanticscholar.org/CorpusID:219719610
  17. Application-driven graph partitioning. VLDB J. 32, 1 (2023), 149–172. https://doi.org/10.1007/s00778-022-00736-2
  18. Improved Approximation Algorithms for Minimum Weight Vertex Separators. SIAM J. Comput. 38, 2 (2008), 629–657. https://doi.org/10.1137/05064299X
  19. Nash Equilibria, the Price of Anarchy and the Fully Mixed Nash Equilibrium Conjecture. In Automata, Languages and Programming, Luís Caires, Giuseppe F. Italiano, Luís Monteiro, Catuscia Palamidessi, and Moti Yung (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 51–65. https://link.springer.com/chapter/10.1007/11523468_5
  20. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI. USENIX, 17–30. https://www.usenix.org/conference/osdi12/technical-sessions/presentation/gonzalez
  21. {{\{{GraphX}}\}}: Graph processing in a distributed dataflow framework. In OSDI. 599–613. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/gonzalez
  22. Fast and Accurate Graph Stream Summarization. In ICDE. IEEE, 1118–1129. https://doi.org/10.1109/ICDE.2019.00103
  23. Distributed Edge Partitioning for Trillion-edge Graphs. VLDB 12, 13 (2019), 2379–2392. https://doi.org/10.48550/arXiv.1908.05855
  24. A Streaming Algorithm for Graph Clustering. CoRR abs/1712.04337 (2017). https://doi.org/10.48550/arXiv.1712.04337
  25. Ian Holyer. 1981. The NP-Completeness of Some Edge-Partition Problems. SIAM J. Comput. 10, 4 (1981), 713–717. https://doi.org/10.1137/0210054
  26. Quasi-streaming graph partitioning: A game theoretical approach. TPDS 30, 7 (2019), 1643–1656. https://doi.org/10.1109/TPDS.2018.2890515
  27. GraphBuilder: Scalable Graph ETL Framework. ACM. https://doi.org/10.1145/2484425.2484429
  28. Auxo: A Scalable and Efficient Graph Stream Summarization Structure. VLDB 16, 6 (2023), 1386–1398. https://doi.org/10.14778/3583140.3583154
  29. George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing 20, 1 (1998), 359–392. https://doi.org/10.1137/S1064827595287997
  30. Clustering-based Partitioning for Large Web Graphs. In ICDE. IEEE, 593–606. https://doi.org/10.1109/ICDE53745.2022.00049
  31. What is Twitter, a social network or a news media?. In WWW. 591–600. https://doi.org/10.1145/1772690.1772751
  32. Alexei Ledenev. 2023. Pumba: chaos testing tool for Docker. https://github.com/alexei-led/pumba
  33. Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
  34. Tail-GNN: Tail-Node Graph Neural Networks. In SIGKDD. ACM, 1109–1119. https://doi.org/10.1145/3447548.3467276
  35. Distributed GraphLab: A Framework for Machine Learning in the Cloud. VLDB 5, 8 (2012), 716–727. https://doi.org/10.14778/2212351.2212354
  36. Pregel: a system for large-scale graph processing. In SIGMOD. ACM, 135–146. https://doi.org/10.1145/1807167.1807184
  37. ADWISE: Adaptive Window-Based Streaming Edge Partitioning for High-Speed Graph Processing. In ICDCS. IEEE, 685–695. https://doi.org/10.1109/ICDCS.2018.00072
  38. Ruben Mayer and Hans-Arno Jacobsen. 2021. Hybrid Edge Partitioner: Partitioning Large Power-Law Graphs under Memory Constraints. In SIGMOD. ACM, 1289–1302. https://doi.org/10.1145/3448016.3457300
  39. Out-of-Core Edge Partitioning at Linear Run-Time. In ICDE. IEEE, 2629–2642. https://doi.org/10.1109/ICDE53745.2022.00242
  40. DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks. In SC. ACM, New York, NY, USA, Article 76, 14 pages. https://doi.org/10.1145/3458817.3480856
  41. Partitioner Selection with EASE to Optimize Distributed Graph Processing. IEEE, 2400–2414. https://doi.org/10.1109/ICDE55515.2023.00185
  42. Mark EJ Newman. 2005. Power laws, Pareto distributions and Zipf’s law. Contemporary physics 46, 5 (2005), 323–351. https://www.tandfonline.com/doi/abs/10.1080/00107510500052444
  43. Himchan Park and Min-Soo Kim. 2017. TrillionG: A Trillion-Scale Synthetic Graph Generator Using a Recursive Vector Model. In SIGMOD. ACM, New York, NY, USA, 913–928. https://doi.org/10.1145/3035918.3064014
  44. HDRF: Stream-Based Partitioning for Power-Law Graphs. In CIKM. ACM, 243–252. https://doi.org/10.1145/2806416.2806424
  45. Optimizing Graph Partition by Optimal Vertex-Cut: A Holistic Approach. In ICDE. IEEE, 1019–1031. https://doi.org/10.1109/ICDE55515.2023.00083
  46. A Streaming Graph Partitioning Method to Achieve High Cohesion and Equilibrium via Multiplayer Repeated Game. TCSS (2022), 1–12. https://doi.org/10.1109/TCSS.2022.3226230
  47. Monireh Taimouri and Hamid Saadatfar. 2019. RBSEP: a reassignment and buffer based streaming edge partitioning approach. J. Big Data 6 (2019), 92. https://doi.org/10.1186/s40537-019-0257-5
  48. Graph Stream Summarization: From Big Bang to Big Crunch. In SIGMOD, Fatma Özcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 1481–1496. https://doi.org/10.1145/2882903.2915223
  49. BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling. In MLSys. mlsys.org. https://arxiv.org/abs/2203.10983
  50. ScaleG: A Distributed Disk-Based System for Vertex-Centric Graph Processing. IEEE TKDE 35, 2 (2023), 2019–2033. https://doi.org/10.1109/TKDE.2021.3101057
  51. Distributed Power-law Graph Computing: Theoretical and Empirical Analysis. In NeurIPS. 1673–1681. https://dl.acm.org/doi/10.5555/2968826.2969013
  52. Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs. VLDB 7, 14 (2014), 1981–1992. https://doi.org/10.14778/2733085.2733103
  53. G-thinker: A Distributed Framework for Mining Subgraphs in a Big Graph. In ICDE. IEEE, 1369–1380. https://doi.org/10.1109/ICDE48307.2020.00122
  54. Jaewon Yang and Jure Leskovec. 2012. Defining and evaluating network communities based on ground-truth. In ACM SIGKDD Workshop. 1–8. https://doi.org/10.1145/2350190.2350193
  55. Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 103–117. https://dl.acm.org/doi/abs/10.1145/3575693.3575725
  56. Graph Edge Partitioning via Neighborhood Heuristic. In SIGKDD. ACM, 605–614. https://doi.org/10.1145/3097983.3098033
  57. TopoX: Topology Refactorization for Minimizing Network Communication in Graph Computations. IEEE ToN 28, 6 (2020), 2768–2782. https://doi.org/10.1109/TNET.2020.3020813
  58. AliGraph: A Comprehensive Graph Neural Network Platform. VLDB 12, 12 (2019), 2094–2105. https://doi.org/10.14778/3352063.3352127
  59. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In ATC, Shan Lu and Erik Riedel (Eds.). USENIX, 375–386. https://dl.acm.org/doi/10.5555/2813767.2813795
  60. GCNSplit: Bounding the State of Streaming Graph Partitioning. ACM, New York, NY, USA, Article 3, 12 pages. https://doi.org/10.1145/3533702.3534920
Citations (1)

Summary

We haven't generated a summary for this paper yet.