GraphMini: Accelerating Graph Pattern Matching Using Auxiliary Graphs (2403.01050v1)
Abstract: Graph pattern matching is a fundamental problem encountered by many common graph mining tasks and the basic building block of several graph mining systems. This paper explores for the first time how to proactively prune graphs to speed up graph pattern matching by leveraging the structure of the query pattern and the input graph. We propose building auxiliary graphs, which are different pruned versions of the graph, during query execution. This requires careful balancing between the upfront cost of building and managing auxiliary graphs and the gains of faster set operations. To this end, we propose GraphMini, a new system that uses query compilation and a new cost model to minimize the cost of building and maintaining auxiliary graphs and maximize gains. Our evaluation shows that using GraphMini can achieve one order of magnitude speedup compared to state-of-the-art subgraph enumeration systems on commonly used benchmarks.
- W. Fan, “Graph pattern matching revised for social network analysis,” ACM International Conference Proceeding Series, 03 2012.
- T. A. B. Snijders, P. E. Pattison, G. L. Robins, and M. S. Handcock, “New specifications for exponential random graph models,” Sociological Methodology, vol. 36, no. 1, pp. 99–153, 2006. [Online]. Available: https://doi.org/10.1111/j.1467-9531.2006.00176.x
- N. Alon, P. Dao, I. Hajirasouliha, F. Hormozdiari, and C. Sahinalp, “Biomolecular network motif counting and discovery by color coding,” Bioinformatics (Oxford, England), vol. 24, pp. i241–9, 07 2008.
- K. Jamshidi, R. Mahadasa, and K. Vora, “Peregrine: A pattern-aware graph mining system,” in Proceedings of the Fifteenth European Conference on Computer Systems, ser. EuroSys ’20. New York, NY, USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi.org/10.1145/3342195.3387548
- D. Mawhirter and B. Wu, “Automine: Harmonizing high-level abstraction and high performance for graph mining,” in Proceedings of the 27th ACM Symposium on Operating Systems Principles, ser. SOSP ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 509–523. [Online]. Available: https://doi.org/10.1145/3341301.3359633
- D. Mawhirter, S. Reinehr, C. Holmes, T. Liu, and B. Wu, “Graphzero: Breaking symmetry for efficient graph mining,” 2019. [Online]. Available: https://arxiv.org/abs/1911.12877
- D. Mawhirter, S. Reinehr, W. Han, N. Fields, M. Claver, C. Holmes, J. McClurg, T. Liu, and B. Wu, “Dryadic: Flexible and fast graph pattern matching at scale,” in 30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021, Atlanta, GA, USA, September 26-29, 2021, J. Lee and A. Cohen, Eds. IEEE, 2021, pp. 289–303. [Online]. Available: https://doi.org/10.1109/PACT52795.2021.00028
- S. Han, L. Zou, and J. X. Yu, “Speeding up set intersections in graph algorithms using simd instructions,” in Proceedings of the 2018 International Conference on Management of Data, ser. SIGMOD ’18. New York, NY, USA: Association for Computing Machinery, 2018, p. 1587–1602. [Online]. Available: https://doi.org/10.1145/3183713.3196924
- T. Shi, M. Zhai, Y. Xu, and J. Zhai, “Graphpi: High performance graph pattern matching through effective redundancy elimination,” in SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 2020, pp. 1–14.
- C. R. Aberger, A. Lamb, S. Tu, A. Nötzli, K. Olukotun, and C. Ré, “Emptyheaded: A relational engine for graph processing,” ACM Trans. Database Syst., vol. 42, no. 4, oct 2017. [Online]. Available: https://doi.org/10.1145/3129246
- X. Chen, T. Huang, S. Xu, T. Bourgeat, C. Chung, and A. Arvind, “Flexminer: A pattern-aware accelerator for graph pattern mining,” in 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021, pp. 581–594.
- G. Dai, Z. Zhu, T. Fu, C. Wei, B. Wang, X. Li, Y. Xie, H. Yang, and Y. Wang, “Dimmining: Pruning-efficient and parallel graph mining on near-memory-computing,” in Proceedings of the 49th Annual International Symposium on Computer Architecture, ser. ISCA ’22. New York, NY, USA: Association for Computing Machinery, 2022, p. 130–145. [Online]. Available: https://doi.org/10.1145/3470496.3527388
- C. H. C. Teixeira, A. J. Fonseca, M. Serafini, G. Siganos, M. J. Zaki, and A. Aboulnaga, “Arabesque: A system for distributed graph mining,” in Proceedings of the 25th Symposium on Operating Systems Principles, ser. SOSP ’15. New York, NY, USA: Association for Computing Machinery, 2015, p. 425–440. [Online]. Available: https://doi.org/10.1145/2815400.2815410
- J. R. Ullmann, “An algorithm for subgraph isomorphism,” J. ACM, vol. 23, no. 1, p. 31–42, jan 1976. [Online]. Available: https://doi.org/10.1145/321921.321925
- J. Leskovec and A. Krevl, “SNAP Datasets: Stanford large network dataset collection,” http://snap.stanford.edu/data, Jun. 2014.
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: A system for large-scale graph processing,” in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’10. New York, NY, USA: Association for Computing Machinery, 2010, p. 135–146. [Online]. Available: https://doi.org/10.1145/1807167.1807184
- J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica, “Graphx: Graph processing in a distributed dataflow framework,” in Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI’14. USA: USENIX Association, 2014, p. 599–613.
- J. Shun and G. E. Blelloch, “Ligra: A lightweight graph processing framework for shared memory,” in Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’13. New York, NY, USA: Association for Computing Machinery, 2013, p. 135–146. [Online]. Available: https://doi.org/10.1145/2442516.2442530
- K. Wang, Z. Zuo, J. Thorpe, T. Q. Nguyen, and G. H. Xu, “Rstream: Marrying relational algebra with streaming for efficient graph mining on a single machine,” in Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI’18. USA: USENIX Association, 2018, p. 763–782.
- X. Chen, R. Dathathri, G. Gill, and K. Pingali, “Pangolin: An efficient and flexible graph mining system on cpu and gpu,” Proc. VLDB Endow., vol. 13, no. 8, p. 1190–1205, apr 2020. [Online]. Available: https://doi.org/10.14778/3389133.3389137
- V. Dias, C. H. C. Teixeira, D. Guedes, W. Meira, and S. Parthasarathy, “Fractal: A general-purpose graph pattern mining system,” in Proceedings of the 2019 International Conference on Management of Data, ser. SIGMOD ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 1357–1374. [Online]. Available: https://doi.org/10.1145/3299869.3319875
- S. Sun and Q. Luo, “In-memory subgraph matching: An in-depth study,” in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 1083–1098. [Online]. Available: https://doi.org/10.1145/3318464.3380581
- S. Sun, X. Sun, Y. Che, Q. Luo, and B. He, “Rapidmatch: A holistic approach to subgraph query processing,” Proc. VLDB Endow., vol. 14, no. 2, p. 176–188, oct 2020. [Online]. Available: https://doi.org/10.14778/3425879.3425888
- L. Xiang, A. Khan, E. Serra, M. Halappanavar, and A. Sukumaran-Rajam, “Cuts: Scaling subgraph isomorphism on distributed multi-gpu systems using trie based data structure,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’21. New York, NY, USA: Association for Computing Machinery, 2021. [Online]. Available: https://doi.org/10.1145/3458817.3476214
- H. He and A. K. Singh, “Graphs-at-a-time: Query language and access methods for graph databases,” in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’08. New York, NY, USA: Association for Computing Machinery, 2008, p. 405–418. [Online]. Available: https://doi.org/10.1145/1376616.1376660
- F. Bi, L. Chang, X. Lin, L. Qin, and W. Zhang, “Efficient subgraph matching by postponing cartesian products,” in Proceedings of the 2016 International Conference on Management of Data, ser. SIGMOD ’16. New York, NY, USA: Association for Computing Machinery, 2016, p. 1199–1214. [Online]. Available: https://doi.org/10.1145/2882903.2915236
- M. Han, H. Kim, G. Gu, K. Park, and W.-S. Han, “Efficient subgraph matching: Harmonizing dynamic programming, adaptive matching order, and failing set together,” in Proceedings of the 2019 International Conference on Management of Data, ser. SIGMOD ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 1429–1446. [Online]. Available: https://doi.org/10.1145/3299869.3319880
- B. Bhattarai, H. Liu, and H. H. Huang, “Ceci: Compact embedding cluster index for scalable subgraph matching,” in Proceedings of the 2019 International Conference on Management of Data, ser. SIGMOD ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 1447–1462. [Online]. Available: https://doi.org/10.1145/3299869.3300086
- W.-S. Han, J. Lee, and J.-H. Lee, “Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases,” in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’13. New York, NY, USA: Association for Computing Machinery, 2013, p. 337–348. [Online]. Available: https://doi.org/10.1145/2463676.2465300
- H. Kim, Y. Choi, K. Park, X. Lin, S.-H. Hong, and W.-S. Han, “Versatile equivalences: Speeding up subgraph query processing and subgraph matching,” in Proceedings of the 2021 International Conference on Management of Data, ser. SIGMOD ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 925–937. [Online]. Available: https://doi.org/10.1145/3448016.3457265
- H. Kim, J. Lee, S. S. Bhowmick, W.-S. Han, J. Lee, S. Ko, and M. H. Jarrah, “Dualsim: Parallel subgraph enumeration in a massive graph on a single machine,” in Proceedings of the 2016 International Conference on Management of Data, ser. SIGMOD ’16. New York, NY, USA: Association for Computing Machinery, 2016, p. 1231–1245. [Online]. Available: https://doi.org/10.1145/2882903.2915209
- Z. Yang, L. Lai, X. Lin, K. Hao, and W. Zhang, “Huge: An efficient and scalable subgraph enumeration system,” in Proceedings of the 2021 International Conference on Management of Data, 2021, pp. 2049–2062.
- S. Sun, Y. Che, L. Wang, and Q. Luo, “Efficient parallel subgraph enumeration on a single machine,” in 2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019, pp. 232–243.
- M. Qiao, H. Zhang, and H. Cheng, “Subgraph matching: On compression and computation,” Proc. VLDB Endow., vol. 11, no. 2, p. 176–188, oct 2017. [Online]. Available: https://doi.org/10.14778/3149193.3149198
- W. Guo, Y. Li, and K.-L. Tan, “Exploiting reuse for gpu subgraph enumeration,” IEEE Transactions on Knowledge and Data Engineering, 2020.
- S. Sahu, A. Mhedhbi, S. Salihoglu, J. Lin, and M. T. Özsu, “The ubiquity of large graphs and surprising challenges of graph processing,” Proc. VLDB Endow., vol. 11, no. 4, p. 420–431, dec 2017. [Online]. Available: https://doi.org/10.1145/3164135.3164139
- M. Besta, E. Peter, R. Gerstenberger, M. Fischer, M. Podstawski, C. Barthels, G. Alonso, and T. Hoefler, “Demystifying graph databases: Analysis and taxonomy of data organization, system designs, and graph queries,” 2019. [Online]. Available: https://arxiv.org/abs/1910.09017
- D. Nguyen, M. Aref, M. Bravenboer, G. Kollias, H. Q. Ngo, C. Ré, and A. Rudra, “Join processing for graph patterns: An old dog with new tricks,” in Proceedings of the GRADES’15, ser. GRADES’15. New York, NY, USA: Association for Computing Machinery, 2015. [Online]. Available: https://doi.org/10.1145/2764947.2764948
- A. Mhedhbi and S. Salihoglu, “Optimizing subgraph queries by combining binary and worst-case optimal joins,” arXiv preprint arXiv:1903.02076, 2019.
- Juelin Liu (3 papers)
- Sandeep Polisetty (7 papers)
- Hui Guan (34 papers)
- Marco Serafini (17 papers)