Automated Discovery of Process Models from Event Logs: Review and Benchmark (1705.02288v3)

Published 5 May 2017 in cs.SE

Abstract: Process mining allows analysts to exploit logs of historical executions of business processes to extract insights regarding the actual performance of these processes. One of the most widely studied process mining operations is automated process discovery. An automated process discovery method takes as input an event log, and produces as output a business process model that captures the control-flow relations between tasks that are observed in or implied by the event log. Various automated process discovery methods have been proposed in the past two decades, striking different tradeoffs between scalability, accuracy and complexity of the resulting models. However, these methods have been evaluated in an ad-hoc manner, employing different datasets, experimental setups, evaluation measures and baselines, often leading to incomparable conclusions and sometimes unreproducible results due to the use of closed datasets. This article provides a systematic review and comparative evaluation of automated process discovery methods, using an open-source benchmark and covering twelve publicly-available real-life event logs, twelve proprietary real-life event logs, and nine quality metrics. The results highlight gaps and unexplored tradeoffs in the field, including the lack of scalability of some methods and a strong divergence in their performance with respect to the different quality metrics used.

Authors (8)

Adriano Augusto (9 papers)
Raffaele Conforti (2 papers)
Marlon Dumas (61 papers)
Marcello La Rosa (28 papers)
Fabrizio Maria Maggi (30 papers)
Andrea Marrella (9 papers)
Massimo Mecella (10 papers)
Allar Soo (2 papers)

Citations (324)

View on Semantic Scholar

Summary

Overview of Automated Discovery of Process Models from Event Logs

The paper, "Automated Discovery of Process Models from Event Logs: Review and Benchmark," provides a comprehensive examination of the landscape of process mining, particularly focusing on automated process discovery. This discipline involves extracting business process models from event logs that capture historical execution data. Automating this extraction is crucial for understanding the process performance and aiding process management.

Key Contributions

Systematic Literature Review (SLR): The paper conducts an extensive SLR of automated process discovery methods, analyzing research studies and categorizing them across several dimensions, including model types and evaluation data used. This SLR aims to address inconsistencies and gaps in evaluating these methods by providing a unified benchmark.
Benchmarking Process Discovery Methods: The authors present a systematic benchmark of a selection of process discovery techniques, explicitly focusing on those producing procedural models, primarily Petri nets. This benchmark introduces a set of open-source real-life event logs to facilitate reproducible and comparable results across different studies.
Evaluation Metrics: The benchmarking is carried out using key metrics such as fitness, precision, generalization, complexity, and soundness, allowing a multi-faceted comparison of the discovered process models. Fitness measures how well a model reproduces log behavior, while precision examines the model's restriction to observed behavior. Generalization looks into the model's ability to capture unseen but valid behavior, whereas complexity evaluates model understandability, and soundness assesses behavioral correctness.
Findings and Observations: The empirical evaluation identifies that the methods vary significantly in terms of scalability and quality. Techniques like the Inductive Miner and Evolutionary Tree Miner demonstrate strength in handling fitness and precision. However, the paper highlights an absence of a one-size-fits-all solution, given the trade-offs inherent between accuracy and complexity.

Practical and Theoretical Implications

The work underscores the importance of developing methods that can handle large-scale event logs as real-world applications necessitate such capabilities. The findings suggest that while modern methods can produce accurate models, handling complex, large-scale logs remains a challenge. Future developments could explore enhancing existing algorithms for scalability or innovating new approaches that balance these trade-offs more effectively.

Furthermore, the benchmarking methodology and the published open-source framework serve as a valuable resource for future research, offering a standardized approach to assessing new process discovery methods.

Speculation on Future Directions

Advancements in AI and data-processing technologies can influence future research directions in process mining. Incorporation of machine learning techniques may enhance the adaptability and efficiency of discovery algorithms. Moreover, developing universally applicable evaluation metrics beyond the current focus on procedural models might foster growth in declarative and hybrid modeling approaches, enriching the landscape of process discovery methods.

In conclusion, this paper provides a critical baseline for researchers in the field of process mining, encouraging further innovation while addressing existing methodological gaps. The detailed benchmarking and open-source contribution are positioned to significantly aid the comparability and the empirical robustness of future research in automated process discovery.

PDF Markdown