A Comprehensive Scalable Framework for Cloud-Native Pattern Detection with Enhanced Expressiveness
Abstract: Detecting complex patterns in large volumes of event logs has diverse applications in various domains, such as business processes and fraud detection. Existing systems like ELK are commonly used to tackle this challenge, but their performance deteriorates for large patterns, while they suffer from limitations in terms of expressiveness and explanatory capabilities for their responses. In this work, we propose a solution that integrates a Complex Event Processing (CEP) engine into a broader query processsor on top of a decoupled storage infrastructure containing inverted indices of log events. The results demonstrate that our system excels in scalability and robustness, particularly in handling complex queries. Notably, our proposed system delivers responses for large complex patterns within seconds, while ELK experiences timeouts after 10 minutes. It also significantly outperforms solutions relying on FlinkCEP and executing MATCH_RECOGNIZE SQL queries.
- The seattle report on database research, Commun. ACM 65 (2022) 72–79. URL: https://doi.org/10.1145/3524284. doi:10.1145/3524284.
- A survey of sequential pattern mining, Data Science and Pattern Recognition 1 (2017) 54–77.
- Sase+: An agile language for kleene closure over event streams, UMass Technical Report (2007).
- On complexity and optimization of expensive queries in complex event processing, in: Proceedings of the 2014 ACM SIGMOD international conference on Management of data, 2014, pp. 217–228.
- M. J. Zaki, SPADE: an efficient algorithm for mining frequent sequences, Machine learning 42 (2001) 31–60.
- A survey of high utility itemset mining, in: High-utility pattern mining, Springer, 2019, pp. 1–45.
- Sequence detection in event log files., in: EDBT, 2021, pp. 85–96.
- A. Chapman, H. V. Jagadish, Why not?, in: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD ’09, Association for Computing Machinery, New York, NY, USA, 2009, pp. 523–534. URL: https://doi.org/10.1145/1559845.1559901. doi:10.1145/1559845.1559901.
- Choosing A cloud DBMS: architectures and tradeoffs, PVLDB 12 (2019) 2170–2182.
- I. Mavroudopoulos, A. Gounaris, Siesta: A scalable infrastructure of sequential pattern analysis, IEEE Transactions on Big Data (2022) 1–16. doi:10.1109/TBDATA.2022.3229092.
- Indexing web access-logs for pattern queries, in: Proceedings of the 4th international workshop on Web information and data management, 2002, pp. 63–68.
- Lcjoin: set containment join via list crosscutting, in: 2019 IEEE 35th International Conference on Data Engineering (ICDE), IEEE, 2019, pp. 362–373.
- G. Cugola, A. Margara, Processing flows of information: From data stream to complex event processing, ACM Comput. Surv. 44 (2012). URL: https://doi.org/10.1145/2187671.2187677. doi:10.1145/2187671.2187677.
- M. Dayarathna, S. Perera, Recent advancements in event processing, ACM Comput. Surv. 51 (2018). URL: https://doi.org/10.1145/3170432. doi:10.1145/3170432.
- Complex event recognition in the big data era: a survey, The VLDB Journal 29 (2020) 313–352.
- High-performance complex event processing over streams, in: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, 2006, pp. 407–418.
- Optimization of sequence queries in database systems, in: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2001, pp. 71–81.
- Cayuga: A general purpose event monitoring system., in: Cidr, volume 7, 2007, pp. 412–422.
- Distributed complex event processing with query rewriting, in: Proceedings of the Third ACM International Conference on Distributed Event-Based Systems, 2009, pp. 1–12.
- Apache flinkâ„¢: Stream and batch processing in a single engine, IEEE Data Eng. Bull. 36 (2015).
- Recognizing patterns in streams with imprecise timestamps, Information Systems 38 (2013) 1187–1211. URL: https://www.sciencedirect.com/science/article/pii/S0306437912000051. doi:https://doi.org/10.1016/j.is.2012.01.002.
- Beyond analytics: The evolution of stream processing systems, in: Proceedings of the 2020 ACM SIGMOD international conference on Management of data, 2020, pp. 2651–2658.
- T-rex: Optimizing pattern search on time series, Proceedings of the ACM on Management of Data 1 (2023) 1–26.
- Acme: A scalable parallel system for extracting frequent patterns from a very long sequence, The VLDB Journal 23 (2014). doi:10.1007/s00778-014-0370-1.
- Madmx: A novel strategy for maximal dense motif extraction, in: International Workshop on Algorithms in Bioinformatics, Springer, 2009, pp. 362–374.
- Varun: discovering extensible motifs under saturation constraints, IEEE/ACM Transactions on Computational Biology and Bioinformatics 7 (2008) 752–762.
- Efficient and accurate discovery of patterns in sequence data sets, IEEE Transactions on Knowledge and Data Engineering 23 (2011) 1154–1168.
- A parallel algorithm for the extraction of structured motifs, in: Proceedings of the 2004 ACM symposium on Applied computing, 2004, pp. 147–153.
- Fuzzy high-utility pattern mining in parallel and distributed hadoop framework, Information Sciences 553 (2021) 31–48.
- Efficient discovery of sequence outlier patterns, Proceedings of the VLDB Endowment 12 (2019) 920–932.
- Top-k self-adaptive contrast sequential pattern mining, IEEE transactions on cybernetics (2021).
- Uspan: an efficient algorithm for mining high utility sequential patterns, in: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012, pp. 660–668.
- Mining time-constrained sequential patterns with constraint programming, Constraints 22 (2017) 548–570.
- Constraint-based sequential pattern mining: the pattern-growth methods, Journal of Intelligent Information Systems 28 (2007) 133–160.
- Detection and removal of infrequent behavior from event streams of business processes, Information Systems 90 (2020) 101451.
- Filtering out infrequent events by expectation from business process event logs, in: 2018 14th international conference on computational intelligence and security (CIS), IEEE, 2018, pp. 374–377.
- A multi-view deep learning approach for predictive business process monitoring, IEEE Transactions on Services Computing (2021).
- Real-time business process monitoring method for prediction of abnormal termination using knni-based lof prediction, Expert Systems with Applications 39 (2012) 6061–6068.
- C. D. Francescomarino, C. Ghidini, Predictive process monitoring, in: Process Mining Handbook, volume 448 of Lecture Notes in Business Information Processing, Springer, 2022, pp. 320–346. doi:10.1007/978-3-031-08848-3_10.
- Event-based failure prediction in distributed business processes, Information Systems 81 (2019) 220–235.
- Q. T. Tran, C.-Y. Chan, How to conquer why-not questions, in: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, Association for Computing Machinery, New York, NY, USA, 2010, pp. 15–26. URL: https://doi.org/10.1145/1807167.1807172. doi:10.1145/1807167.1807172.
- On the provenance of non-answers to queries over extracted data, Proc. VLDB Endow. 1 (2008) 736–747. URL: https://doi.org/10.14778/1453856.1453936. doi:10.14778/1453856.1453936.
- M. Herschel, M. A. Hernández, Explaining missing answers to spjua queries 3 (2010) 185–196. URL: https://doi.org/10.14778/1920841.1920869. doi:10.14778/1920841.1920869.
- Why not match: On explanations of event pattern queries, in: G. Li, Z. Li, S. Idreos, D. Srivastava (Eds.), SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, ACM, 2021, pp. 1705–1717. URL: https://doi.org/10.1145/3448016.3452818. doi:10.1145/3448016.3452818.
- Sequence pattern matching over event data with temporal uncertainty, in: Proc. 17th International Conference on Extending Database Technology (EDBT), Athens, Greece, 2014, pp. 205–216.
- Logstore: A cloud-native and multi-tenant log database, in: G. Li, Z. Li, S. Idreos, D. Srivastava (Eds.), SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, ACM, 2021, pp. 2464–2476. URL: https://doi.org/10.1145/3448016.3457565. doi:10.1145/3448016.3457565.
- Delta lake: high-performance acid table storage over cloud object stores, Proceedings of the VLDB Endowment 13 (2020) 3411–3424.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.