- The paper introduces a taxonomy for outcome-oriented predictive process monitoring methods based on trace bucketing and sequence encoding.
- Benchmark results indicate XGBoost and Random Forest achieve high accuracy, and aggregation-based encoding balances information and simplicity.
- The study offers a valuable resource and framework for understanding and applying predictive process monitoring techniques in practice.
A Review and Benchmark of Outcome-Oriented Predictive Process Monitoring
This paper by Teinemaa et al. explores the domain of outcome-oriented predictive process monitoring, presenting a systematic review and a benchmark of eleven methods using real-life event logs from business processes. Predictive process monitoring has garnered significant attention due to its capacity to anticipate the future states of ongoing processes using historical execution data. The authors focus on outcome prediction, addressing the limitations of previous studies that used disparate datasets, experimental settings, and evaluation metrics, hindering method comparison.
Methods and Taxonomy
The authors lay down a comprehensive taxonomy distinguishing existing methods based on two facets: trace bucketing and sequence encoding. Given the diverse nature of business process execution data (traces), effective prediction requires defining coherent mechanisms to organize (bucket) traces and transform (encode) sequence data into suitable formats for prediction. The taxonomy presents combinations of these perspectives, including single bucket and KNN bucketing, which utilize aggregation for control flow encoding, and more sophisticated methods such as clustering and state-based bucketing.
- Trace Bucketing: Methods vary from single bucket approaches—offering simplicity by using one classifier throughout—to clustering-based bucketing that involves grouping trace prefixes based on execution similarity. State-based bucketing leverages process models to define decision points for predictions.
- Sequence Encoding: Encodings transform sequence data attributes into feature vectors, focusing on aspects like last state and aggregation techniques, which condense trace information across time. A lossless encoding is achieved by index-based methods, though they offer less flexibility across varying trace lengths.
Benchmarking and Results
Benchmarking was conducted across nine event logs, covering 24 predictive tasks. Methods were evaluated based on prediction accuracy (mainly AUC), earliness of prediction, and computational efficiency (offline and online processing time). Results revealed XGBoost and Random Forest as top-performing classifiers in accuracy across different datasets, highlighting their ability to capture complex patterns in business process data. In terms of sequence encoding, aggregation-based methods generally offer a meaningful balance between information retention and model simplicity. Interestingly, the paper points out that multiclassifier approaches (cluster or state-based bucketing) often require careful attention to bucket composition to ensure comprehensive training data availability.
Computational Considerations
The paper extends its insight into computational efficiency, presenting gap-based filtering and categorical domain filtering as strategic ways to control execution times without significantly affecting prediction accuracy. Such techniques help balance the intricacies of predictive monitoring where operational demands necessitate responsiveness.
Implications and Directions
This paper provides key implications for predictive process monitoring, serving as a resource for understanding and comparing methods for practical application in business process management. The findings encourage considering aggregation encoding over lossless methods for enhanced prediction reliability across heterogeneous datasets. Moreover, the taxonomy lays groundwork for speculating advancements involving deep learning (e.g., LSTM networks) in managing sequence data.
The open-source framework developed as part of this research creates opportunities for researchers to further explore, extend, and adapt AI strategies in predictive monitoring, making the pursuit of real-time process insights more accessible. Overall, as predictive monitoring evolves, this paper serves as a significant step toward structured understanding and application of outcome-oriented predictive techniques in AI.