XMiner: Efficient Directed Subgraph Matching with Pattern Reduction (2404.11105v1)
Abstract: Graph pattern matching, one of the fundamental graph mining problems, aims to extract structural patterns of interest from an input graph. The state-of-the-art graph matching algorithms and systems are mainly designed for undirected graphs. Directed graph matching is more complex than undirected graph matching because the edge direction must be taken into account before the exploration of each directed edge. Thus, the technologies (e.g. storage, exploiting symmetry for graph matching) for undirected graph matching may not be fully applicable to directed graphs. For example, the redundancy implied in directed graph pattern can not be detected using the symmetry breaking for undirected pattern graph. Here, we present XMiner for efficient directed graph pattern matching whose core idea is 'pattern reduction'. It first analyzes the relationship between constraints implied in a pattern digraph. Then it reduces the pattern graph into a simplified form by finding a minimum constraint cover. Finally, XMiner generates an execution plan and follows it to extract matchings of the pattern graph. So, XMiner works on simplified pattern graph and avoids much data access and redundant computation throughout the matching process. Our experimental results show that XMiner outperforms state-of the-art stand-alone graph matching systems, and scales to complex graph pattern matching tasks on larger graph.
- On convex relaxation of graph isomorphism. Proceedings of the National Academy of Sciences 112 (2015), 2942 – 2947.
- SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems. In Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’21). ACM, 282–297.
- Tesseract: Distributed, General Graph Pattern Mining on Evolving Graphs. In Proceedings of the Sixteenth European Conference on Computer Systems (Online Event, United Kingdom) (EuroSys’ 21). ACM, New York, NY, USA, 458–473.
- Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling Multithreaded Computations by Work Stealing. J. ACM 46, 5 (sep 1999), 720–748. https://doi.org/10.1145/324133.324234
- UbiCrawler: A Scalable Fully Distributed Web Crawler. Software: Practice & Experience 34, 8 (2004), 711–726.
- F. Warren Burton and M. Ronan Sleep. 1981. Executing Functional Programs on a Virtual Tree of Processors. In Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture (Portsmouth, New Hampshire, USA) (FPCA ’81). Association for Computing Machinery, New York, NY, USA, 187–194. https://doi.org/10.1145/800223.806778
- G-Miner: An Efficient Task-Oriented Graph Mining System. In Proceedings of the Thirteenth EuroSys Conference (Porto, Portugal). ACM, New York, NY, USA, Article 32, 12 pages.
- Large Scale Graph Mining with G-Miner. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands). ACM, New York, NY, USA, 1881–1884.
- FINGERS: exploiting fine-grained parallelism in graph mining accelerators. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22). ACM, 43–55.
- Efficient and Scalable Graph Pattern Mining on GPUs. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). USENIX Association, Carlsbad, CA, 857–877.
- Sandslash: a two-level framework for efficient graph pattern mining. In Proceedings of the ACM International Conference on Supercomputing. 378–391.
- Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU. Proc. VLDB Endow. 13, 10 (April 2020), 1190–1205.
- FlexMiner: A Pattern-Aware Accelerator for Graph Pattern Mining. In Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA 2021). IEEE, 581–594.
- Fractal: A General-Purpose Graph Pattern Mining System. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands). ACM, New York, NY, USA, 1357–1374.
- Overview of the 2003 KDD Cup. SIGKDD Explor. 5, 2 (2003), 149–151.
- Martin Grohe and Pascal Schweitzer. 2020. The Graph Isomorphism Problem. Commun. ACM 63, 11 (oct 2020), 128–134. https://doi.org/10.1145/3372123
- Cyclosa: Redundancy-Free Graph Pattern Mining via Set Dataflow. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 71–85. https://www.usenix.org/conference/atc23/presentation/gui
- SumPA: Efficient Pattern-Centric Graph Mining with Pattern Abstraction. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT). 318–330.
- GPU-Accelerated Subgraph Enumeration on Partitioned Graphs. In Proceedings of the 2020 International Conference on Management of Data (SIGMOD 2020). ACM, 1067–1082.
- Exploiting Reuse for GPU Subgraph Enumeration. IEEE Trans. Knowl. Data Eng. 34, 9 (2022), 4231–4244.
- Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands). ACM, New York, NY, USA, 1429–1446.
- Turboiso: Towards Ultrafast and Robust Subgraph Isomorphism Search in Large Graph Databases. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (New York, New York, USA). ACM, New York, NY, USA, 337–348.
- Lin Hu and Lei Zou. 2022. A GPU-based Graph Pattern Mining System. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM 2022). ACM, 4867–4871.
- Peregrine: A Pattern-Aware Graph Mining System. In Proceedings of the Fifteenth European Conference on Computer Systems (Heraklion, Greece). ACM, New York, NY, USA, Article 13, 16 pages.
- Versatile Equivalences: Speeding up Subgraph Query Processing and Subgraph Matching. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD ’21). ACM, New York, NY, USA, 925–937.
- DUALSIM: Parallel Subgraph Enumeration in a Massive Graph on a Single Machine. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD ’16) (San Francisco, California, USA). ACM, New York, NY, USA, 1231–245.
- OPT: A New Framework for Overlapped and Parallel Triangulation in Large-Scale Graphs. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD ’14) (Snowbird, Utah, USA). ACM, New York, NY, USA, 637–648.
- Signed networks in social media. In Proceedings of the 28th International Conference on Human Factors in Computing Systems (CHI 2010). ACM, 1361–1370.
- Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Math. 6, 1 (2009), 29–123.
- GraphZero: Breaking Symmetry for Efficient Graph Mining. CoRR abs/1911.12877 (2019).
- Daniel Mawhirter and Bo Wu. 2019. AutoMine: Harmonizing High-Level Abstraction and High Performance for Graph Mining. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (Huntsville, Ontario, Canada) (SOSP ’19). ACM, New York, NY, USA, 509–523.
- Julian J. McAuley and Jure Leskovec. 2012. Learning to Discover Social Circles in Ego Networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems 2012. 548–556.
- Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins. Proc. VLDB Endow. 12, 11 (jul 2019), 1692–1704.
- NScale: neighborhood-centric large-scale graph analytics in the cloud. VLDB J. 25, 2 (2016), 125–150.
- IntersectX: An Accelerator for Graph Mining. CoRR abs/2012.10848 (2020).
- PruneJuice: Pruning Trillion-Edge Graphs to a Precise Pattern-Matching Solution (SC ’18). IEEE Press, Article 21, 17 pages.
- QFrag: Distributed Graph Search via Subgraph Isomorphism. In Proceedings of the 2017 Symposium on Cloud Computing (Santa Clara, California) (SoCC ’17). ACM, New York, NY, USA, 214–228.
- GraphPi: High Performance Graph Pattern Matching through Effective Redundancy Elimination. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Atlanta, Georgia) (SC ’20). IEEE Press, Article 100, 14 pages.
- Shixuan Sun and Qiong Luo. 2020. In-Memory Subgraph Matching: An In-Depth Study. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD’20). ACM, New York, NY, USA, 1083–1098.
- NDMiner: accelerating graph pattern mining using near data processing. In Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA ’22). ACM, 146–159.
- N. Talukder and M. J. Zaki. 2016. A distributed approach for graph mining in massive networks. Data Mining and Knowledge Discovery 30, 5 (2016), 1024–1052.
- Arabesque: A System for Distributed Graph Mining. In Proceedings of the 25th Symposium on Operating Systems Principles (Monterey, California) (SOSP ’15). ACM, New York, NY, USA, 425–440.
- RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 763–782.
- G-thinker: A Distributed Framework for Mining Subgraphs in a Big Graph. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). 1369–1380.
- Jaewon Yang and Jure Leskovec. 2012a. Defining and Evaluating Network Communities Based on Ground-Truth. In Proceedings of the 12th IEEE International Conference on Data Mining (ICDM 2012). IEEE Computer Society, 745–754.
- Jaewon Yang and Jure Leskovec. 2012b. Defining and Evaluating Network Communities Based on Ground-Truth. In 12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, December 10-13, 2012. IEEE Computer Society, 745–754.
- A Locality-Aware Energy-Efficient Accelerator for Graph Mining Applications. In Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2020). IEEE, 895–907.
- Big RDF Data Storage, Computation, and Analysis: A Strawman’s Arguments. In 39th IEEE International Conference on Distributed Computing Systems (ICDCS 2019). IEEE, Dallas, TX, USA, 1693–1703.
- GSI: GPU-friendly Subgraph Isomorphism. In Proceedings of the 36th IEEE International Conference on Data Engineering (ICDE 2020). IEEE, 1249–1260.
- Kaleido: An Efficient Out-of-core Graph Mining System on A Single Machine. In Proceedings of the 36th IEEE International Conference on Data Engineering (ICDE 2020). IEEE, 673–684.
- Pingpeng Yuan (4 papers)
- Yujiang Wang (83 papers)
- Tianyu Ma (17 papers)
- Siyuan He (9 papers)
- Ling Liu (132 papers)