CuckooGraph: A Scalable and Space-Time Efficient Data Structure for Large-Scale Dynamic Graphs
Abstract: Graphs play an increasingly important role in various big data applications. However, existing graph data structures cannot simultaneously address the performance bottlenecks caused by the dynamic updates, large scale, and high query complexity of current graphs. This paper proposes a novel data structure for large-scale dynamic graphs called CuckooGraph. It does not require any prior knowledge of the upcoming graphs, and can adaptively resize to the most memory-efficient form while requiring few memory accesses for very fast graph data processing. The key techniques of CuckooGraph include TRANSFORMATION and DENYLIST. TRANSFORMATION fully utilizes the limited memory by designing related data structures that allow flexible space transformations to smoothly expand/tighten the required space depending on the number of incoming items. DENYLIST efficiently handles item insertion failures and further improves processing speed. Our experimental results show that compared with the most competitive solution Spruce, CuckooGraph achieves about $33\times$ higher insertion throughput while requiring only about $68\%$ of the memory space.
- J. Li, X. Wang, K. Deng, X. Yang, T. Sellis, and J. X. Yu, “Most influential community search over large social networks,” in 2017 IEEE 33rd International Conference on Data Engineering (ICDE), 2017, pp. 871–882.
- Y. Matsunobu, S. Dong, and H. Lee, “Myrocks: Lsm-tree database storage engine serving facebook’s social graph,” Proceedings of the VLDB Endowment, vol. 13, no. 12, pp. 3217–3230, 2020.
- J. Zhang, C. Gao, D. Jin, and Y. Li, “Group-buying recommendation for social e-commerce,” in 2021 IEEE 37th International Conference on Data Engineering (ICDE), 2021, pp. 1536–1547.
- D. Wang, J. Lin, P. Cui, Q. Jia, Z. Wang, Y. Fang, Q. Yu, J. Zhou, S. Yang, and Y. Qi, “A semi-supervised graph attentive network for financial fraud detection,” in 2019 IEEE International Conference on Data Mining (ICDM), 2019, pp. 598–607.
- J. Jiang, Y. Li, B. He, B. Hooi, J. Chen, and J. K. Z. Kang, “Spade: a real-time fraud detection framework on evolving graphs,” Proceedings of the VLDB Endowment, vol. 16, no. 3, pp. 461–469, 2022.
- X. Huang, Y. Yang, Y. Wang, C. Wang, Z. Zhang, J. Xu, L. Chen, and M. Vazirgiannis, “Dgraph: A large-scale financial dataset for graph anomaly detection,” Advances in Neural Information Processing Systems, vol. 35, pp. 22 765–22 777, 2022.
- M. Iliofotou, P. Pappu, M. Faloutsos, M. Mitzenmacher, S. Singh, and G. Varghese, “Network monitoring using traffic dispersion graphs (tdgs),” in Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, 2007, pp. 315–320.
- T. Wang and L. Liu, “Privacy-aware mobile services over road networks,” Proceedings of the VLDB Endowment, vol. 2, no. 1, pp. 1042–1053, 2009.
- M. Simeonovski, G. Pellegrino, C. Rossow, and M. Backes, “Who controls the internet? analyzing global threats using property graph traversals,” in Proceedings of the 26th International Conference on World Wide Web, 2017, pp. 647–656.
- Y. Ma, P. Gerard, Y. Tian, Z. Guo, and N. V. Chawla, “Hierarchical spatio-temporal graph neural networks for pandemic forecasting,” in Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM), 2022, pp. 1481–1490.
- X. Zhu, X. Huang, L. Sun, and J. Liu, “A novel graph indexing approach for uncovering potential covid-19 transmission clusters,” ACM Transactions on Knowledge Discovery from Data, vol. 17, no. 2, pp. 1–24, 2023.
- J. Mondal and A. Deshpande, “Managing large dynamic graphs efficiently,” in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, 2012, pp. 145–156.
- P. Pandey, B. Wheatman, H. Xu, and A. Buluc, “Terrace: A hierarchical graph container for skewed dynamic graphs,” in Proceedings of the 2021 International Conference on Management of Data (SIGMOD), 2021, pp. 1372–1385.
- J. Hou, Z. Zhao, Z. Wang, W. Lu, G. Jin, D. Wen, and X. Du, “Aeong: An efficient built-in temporal support in graph databases,” Proceedings of the VLDB Endowment, vol. 17, no. 6, pp. 1515–1527, 2024.
- M. Potamias, F. Bonchi, A. Gionis, and G. Kollios, “K-nearest neighbors in uncertain graphs,” Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 997–1008, 2010.
- U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos, “Gbase: an efficient analysis platform for large graphs,” The VLDB Journal, vol. 21, pp. 637–650, 2012.
- S. Sahu, A. Mhedhbi, S. Salihoglu, J. Lin, and M. T. Özsu, “The ubiquity of large graphs and surprising challenges of graph processing,” Proceedings of the VLDB Endowment, vol. 11, no. 4, pp. 420–431, 2017.
- Y. Shao, B. Cui, L. Chen, L. Ma, J. Yao, and N. Xu, “Parallel subgraph listing in a large-scale graph,” in Proceedings of the 2014 ACM SIGMOD international conference on Management of Data, 2014, pp. 625–636.
- Y.-Y. Jo, M.-H. Jang, S.-W. Kim, and S. Park, “Realgraph: A graph engine leveraging the power-law distribution of real-world graphs,” in The World Wide Web Conference, 2019, pp. 807–817.
- Z. Wei, X. He, X. Xiao, S. Wang, Y. Liu, X. Du, and J.-R. Wen, “Prsim: Sublinear time simrank computation on large power-law graphs,” in Proceedings of the 2019 International Conference on Management of Data (SIGMOD), 2019, pp. 1042–1059.
- C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi, “Scalable mining of large disk-based graph databases,” in Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 316–325.
- L. Zou, M. T. Özsu, L. Chen, X. Shen, R. Huang, and D. Zhao, “gstore: a graph-based sparql query engine,” The VLDB journal, vol. 23, pp. 565–590, 2014.
- W. Sun, A. Fokoue, K. Srinivas, A. Kementsietsidis, G. Hu, and G. Xie, “Sqlgraph: An efficient relational-based property graph store,” in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015, pp. 1887–1901.
- X. Zhu, G. Feng, M. Serafini, X. Ma, J. Yu, L. Xie, A. Aboulnaga, and W. Chen, “Livegraph: a transactional graph storage system with purely sequential adjacency list scans,” Proceedings of the VLDB Endowment, vol. 13, no. 7, pp. 1020–1034, 2020.
- A. Mhedhbi, P. Gupta, S. Khaliq, and S. Salihoglu, “A+ indexes: Tunable and space-efficient adjacency lists in graph database management systems,” in 2021 IEEE 37th International Conference on Data Engineering (ICDE), 2021, pp. 1464–1475.
- “Neo4j website.” https://neo4j.com/, 2022.
- “OrientDB website.” http://orientdb.org/, 2020.
- “ArangoDB website.” https://www.arangodb.com/, 2021.
- “JanusGraph website.” https://janusgraph.org/, 2021.
- “GraphDB website.” https://www.ontotext.com/products/graphdb/, 2022.
- R. Qiu, Y. Ming, Y. Hong, H. Li, and T. Yang, “Wind-bell index: Towards ultra-fast edge query for graph databases,” in 2023 IEEE 39th International Conference on Data Engineering (ICDE), 2023, pp. 2090–2098.
- R. Pagh and F. F. Rodler, “Cuckoo hashing,” Journal of Algorithms, vol. 51, no. 2, pp. 122–144, 2004.
- “Hash website,” http://burtleburtle.net/bob/hash/evahash.html.
- “The CAIDA Anonymized Internet Traces,” https://www.caida.org/catalog/datasets/overview/.
- “Note Dame web graph,” http://snap.stanford.edu/data/web-NotreDame.html.
- “Stack Overflow temporal network,” http://snap.stanford.edu/data/sx-stackoverflow.html.
- “Wikipedia talk (en),” http://konect.cc/networks/wiki_talk_en/.
- E. W. Dijkstra, “A note on two problems in connexion with graphs,” in Edsger Wybe Dijkstra: His Life, Work, and Legacy, 2022, pp. 287–290.
- P. Zhao, C. C. Aggarwal, and M. Wang, “gsketch: on query estimation in graph streams,” Proceedings of the VLDB Endowment, vol. 5, no. 3, pp. 193–204, 2011.
- N. Tang, Q. Chen, and P. Mitra, “Graph stream summarization: From big bang to big crunch,” in Proceedings of the 2016 International Conference on Management of Data (SIGMOD), 2016, pp. 1481–1496.
- Gou, Xiangyang and Zou, Lei and Zhao, Chenxingyu and Yang, Tong, “Fast and accurate graph stream summarization,” in 2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019, pp. 1118–1129.
- X. Gou, L. Zou, C. Zhao, and T. Yang, “Graph stream sketch: Summarizing graph streams with high speed and accuracy,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 6, pp. 5901–5914, 2023.
- J. Ko, Y. Kook, and K. Shin, “Incremental lossless graph summarization,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 317–327.
- Z. Ma, J. Yang, K. Li, Y. Liu, X. Zhou, and Y. Hu, “A parameter-free approach for lossless streaming graph summarization,” in Proceedings of the 26th International Conference Database Systems for Advanced Applications (DASFAA), 2021, pp. 385–393.
- Z. Jiang, H. Chen, and H. Jin, “Auxo: A scalable and efficient graph stream summarization structure,” Proceedings of the VLDB Endowment, vol. 16, no. 6, pp. 1386–1398, 2023.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.