Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FLEXIS: FLEXible Frequent Subgraph Mining using Maximal Independent Sets (2404.01585v1)

Published 2 Apr 2024 in cs.DB and cs.PF

Abstract: Frequent Subgraph Mining (FSM) is the process of identifying common subgraph patterns that surpass a predefined frequency threshold. While FSM is widely applicable in fields like bioinformatics, chemical analysis, and social network anomaly detection, its execution remains time-consuming and complex. This complexity stems from the need to recognize high-frequency subgraphs and ascertain if they exceed the set threshold. Current approaches to identifying these patterns often rely on edge or vertex extension methods. However, these strategies can introduce redundancies and cause increased latency. To address these challenges, this paper introduces a novel approach for identifying potential k-vertex patterns by combining two frequently observed (k - 1)-vertex patterns. This method optimizes the breadth-]first search, which allows for quicker search termination based on vertices count and support value. Another challenge in FSM is the validation of the presumed pattern against a specific threshold. Existing metrics, such as Maximum Independent Set (MIS) and Minimum Node Image (MNI), either demand significant computational time or risk overestimating pattern counts. Our innovative approach aligns with the MIS and identifies independent subgraphs. Through the "Maximal Independent Set" metric, this paper offers an efficient solution that minimizes latency and provides users with control over pattern overlap. Through extensive experimentation, our proposed method achieves an average of 10.58x speedup when compared to GraMi and an average 3x speedup when compared to T-FSM

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Emptyheaded: A relational engine for graph processing. ACM Transactions on Database Systems (TODS) 42, 4 (2017), 1–44.
  2. PV Bindu and P Santhi Thilagam. 2016. Mining social networks for anomalies: Methods and challenges. Journal of Network and Computer Applications 68 (2016), 213–229.
  3. Björn Bringmann and Siegfried Nijssen. 2008. What is frequent in a single graph?. In Advances in Knowledge Discovery and Data Mining: 12th Pacific-Asia Conference, PAKDD 2008 Osaka, Japan, May 20-23, 2008 Proceedings 12. Springer, Springer Berlin Heidelberg, Berlin, Heidelberg, 858–863. https://doi.org/10.1007/978-3-540-68125-0_84
  4. The VF3-light subgraph isomorphism algorithm: when doing less is more effective. In Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, S+ SSPR 2018, Beijing, China, August 17–19, 2018, Proceedings 9. Springer, Springer Berlin Heidelberg, Berlin, Heidelberg, 315–325. https://doi.org/10.1007/978-3-319-97785-0_30
  5. VF3-Light: a lightweight subgraph isomorphism algorithm and its experimental evaluation. Pattern Recognition Letters 125 (2019), 591–596. https://doi.org/10.1016/j.patrec.2019.07.001
  6. Xuhao Chen et al. 2022. Efficient and Scalable Graph Pattern Mining on {{\{{GPUs}}\}}. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). UNENIX Association, 2560 Ninth St. Suite 215 Berkeley, CA, 857–877.
  7. Sandslash: a two-level framework for efficient graph pattern mining. In Proceedings of the ACM International Conference on Supercomputing. ACM, Association for Computing Machinery, New York, NY, USA, 378–391. https://doi.org/10.48550/arXiv.2011.03135
  8. Pangolin: An efficient and flexible graph mining system on cpu and gpu. Proceedings of the VLDB Endowment 13, 8 (2020), 1190–1205.
  9. John Clark and Derek Allan Holton. 2005. A first look at graph theory. World Scientific, Singapore.
  10. Dimmining: pruning-efficient and parallel graph mining on near-memory-computing. In Proceedings of the 49th Annual International Symposium on Computer Architecture. Association for Computing Machinery, New York, NY, USA, 130–145. https://doi.org/10.1145/3470496.3527388
  11. Grami: Frequent subgraph and pattern mining in a single large graph. Proceedings of the VLDB Endowment 7, 7 (2014), 517–528. https://doi.org/10.14778/2732286.2732289
  12. Mathias Fiedler and Christian Borgelt. 2007. Support computation for mining frequent subgraphs in a single graph. In MLG. Association for Computing Machinery, New York, NY, USA.
  13. Gpu-accelerated subgraph enumeration on partitioned graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, New York, NY, USA, 1067–1082.
  14. Exploiting reuse for GPU subgraph enumeration. IEEE Transactions on Knowledge and Data Engineering 34, 9 (2020), 4231–4244.
  15. Peregrine: a pattern-aware graph mining system. In Proceedings of the Fifteenth European Conference on Computer Systems. Association for Computing Machinery, New York, NY, USA, 1–16. https://doi.org/10.48550/arXiv.2004.02369
  16. A survey of frequent subgraph mining algorithms. The Knowledge Engineering Review 28, 1 (2013), 75–105. https://doi.org/10.1017/S0269888912000331
  17. Tommi Junttila and Petteri Kaski. 2007a. Engineering an efficient canonical labeling tool for large and sparse graphs. In Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments and the Fourth Workshop on Analytic Algorithms and Combinatorics, David Applegate, Gerth Stølting Brodal, Daniel Panario, and Robert Sedgewick (Eds.). SIAM, Philadelphia, 135–149. https://doi.org/10.1137/1.9781611972870.13
  18. Tommi Junttila and Petteri Kaski. 2007b. Engineering an efficient canonical labeling tool for large and sparse graphs. In 2007 Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments (ALENEX). SIAM, Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania, 135–149. https://doi.org/10.5555/2791188.2791201
  19. Tommi Junttila and Petteri Kaski. 2011. Conflict Propagation and Component Recursion for Canonical Labeling. In Theory and Practice of Algorithms in (Computer) Systems – First International ICST Conference, TAPAS 2011, Rome, Italy, April 18–20, 2011. Proceedings (Lecture Notes in Computer Science), Alberto Marchetti-Spaccamela and Michael Segal (Eds.), Vol. 6595. Springer, Berlin, Heidelberg, 151–162. https://doi.org/10.1007/978-3-642-19754-3_16
  20. Sandra R. Kingan. 2022. Graphs and Networks. Wiley, Hoboken, NJ.
  21. Molecule generation by principal subgraph mining and assembling. Advances in Neural Information Processing Systems 35 (2022), 2550–2563.
  22. M. Kuramochi and G. Karypis. 2004. An efficient algorithm for discovering frequent subgraphs. IEEE Transactions on Knowledge and Data Engineering 16, 9 (2004), 1038–1051. https://doi.org/10.1109/TKDE.2004.33
  23. Michihiro Kuramochi and George Karypis. 2005. Finding frequent patterns in a large sparse graph. Data mining and knowledge discovery 11, 3 (2005), 243–271. https://doi.org/10.1007/s10618-005-0003-9
  24. Mining weighted subgraphs in a single large graph. Information Sciences 514 (2020), 149–165. https://doi.org/10.1016/j.ins.2019.12.010
  25. Predicting positive and negative links in online social networks. In Proceedings of the 19th international conference on World wide web. Association for Computing Machinery, New York, NY, USA, 641–650.
  26. Signed networks in social media. In Proceedings of the SIGCHI conference on human factors in computing systems. Association for Computing Machinery, New York, NY, USA, 1361–1370.
  27. Graph evolution: Densification and shrinking diameters. ACM transactions on Knowledge Discovery from Data (TKDD) 1, 1 (2007), 2–es.
  28. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29–123.
  29. Muyi Liu and Pan Li. 2022. SATMargin: Practical Maximal Frequent Subgraph Mining via Margin Space Sampling. In Proceedings of the ACM Web Conference 2022. ACM, Association for Computing Machinery, New York, NY, USA, 1495–1505. https://doi.org/10.1145/3485447.3512196
  30. Classifying android malware through subgraph mining. In International Workshop on Data Privacy Management. Springer, Berlin, Heidelberg, 268–283.
  31. Graphzero: A high-performance subgraph matching system. ACM SIGOPS Operating Systems Review 55, 1 (2021), 21–37.
  32. Daniel Mawhirter and Bo Wu. 2019. Automine: harmonizing high-level abstraction and high performance for graph mining. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. Association for Computing Machinery, New York, NY, USA, 509–523.
  33. Grasping frequent subgraph mining for bioinformatics applications. BioData mining 11, 1 (2018), 1–24.
  34. A method for closed frequent subgraph mining in a single large graph. IEEE Access 9 (2021), 165719–165733. https://doi.org/10.1109/ACCESS.2021.3133666
  35. Fast and scalable algorithms for mining subgraphs in a single large graph. Engineering Applications of Artificial Intelligence 90 (2020), 103539. https://doi.org/10.1016/j.engappai.2020.103539
  36. Subgraph mining in a large graph: A review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12, 4 (2022), e1454.
  37. Trust management for the semantic web. In International semantic Web conference. Springer, Springer Berlin Heidelberg, Berlin, Heidelberg, 351–368.
  38. Mapping the Gnutella network. IEEE internet computing 6, 1 (2002), 50–57.
  39. RASMA: a reverse search algorithm for mining maximal frequent subgraphs. BioData Mining 14 (2021), 1–23.
  40. Graphpi: High performance graph pattern matching through effective redundancy elimination. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, Insitute of Electrical and Electronics Engineers, Piscataway, NJ, USA, 1–14.
  41. CryptoMiniSat 5.6 with YalSAT at the SAT Race 2019. Proc. of SAT Race B-2019-1 (2019), 14–15.
  42. Arabesque: a system for distributed graph mining. In Proceedings of the 25th Symposium on Operating Systems Principles. Association for Computing Machinery, New York, NY, USA, 425–440.
  43. Margin: Maximal frequent subgraph mining. ACM Transactions on Knowledge Discovery from Data (TKDD) 4, 3 (2010), 1–42. https://doi.org/10.1145/1839490.1839491
  44. Xifeng Yan and Jiawei Han. 2002. gspan: Graph-based substructure pattern mining. In 2002 IEEE International Conference on Data Mining, 2002. Proceedings. IEEE, Institute of Electrical and Electronics Engineers, Piscataway, NJ, USA, 721–724. https://doi.org/10.1109/ICDM.2002.1184038
  45. A locality-aware energy-efficient accelerator for graph mining applications. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, Institute of Electrical and Electronics Engineers, Piscataway, NJ, USA, 895–907.
  46. T-FSM: A Task-Based System for Massively Parallel Frequent Subgraph Pattern Mining from a Big Graph. Proc. ACM Manag. Data 1, 1, Article 74 (may 2023), 26 pages. https://doi.org/10.1145/3588928
  47. Kaleido: An efficient out-of-core graph mining system on A single machine. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, Insitute of Electrical and Electronics Engineers, Piscataway, NJ, USA, 673–684.

Summary

We haven't generated a summary for this paper yet.