Outcome-Oriented Predictive Process Monitoring: Review and Benchmark (1707.06766v4)

Published 21 Jul 2017 in cs.AI

Abstract: Predictive business process monitoring refers to the act of making predictions about the future state of ongoing cases of a business process, based on their incomplete execution traces and logs of historical (completed) traces. Motivated by the increasingly pervasive availability of fine-grained event data about business process executions, the problem of predictive process monitoring has received substantial attention in the past years. In particular, a considerable number of methods have been put forward to address the problem of outcome-oriented predictive process monitoring, which refers to classifying each ongoing case of a process according to a given set of possible categorical outcomes - e.g., Will the customer complain or not? Will an order be delivered, canceled or withdrawn? Unfortunately, different authors have used different datasets, experimental settings, evaluation measures and baselines to assess their proposals, resulting in poor comparability and an unclear picture of the relative merits and applicability of different methods. To address this gap, this article presents a systematic review and taxonomy of outcome-oriented predictive process monitoring methods, and a comparative experimental evaluation of eleven representative methods using a benchmark covering 24 predictive process monitoring tasks based on nine real-life event logs.

Authors (4)

Irene Teinemaa (12 papers)
Marlon Dumas (61 papers)
Marcello La Rosa (28 papers)
Fabrizio Maria Maggi (30 papers)

Citations (171)

View on Semantic Scholar

Summary

The paper introduces a taxonomy for outcome-oriented predictive process monitoring methods based on trace bucketing and sequence encoding.
Benchmark results indicate XGBoost and Random Forest achieve high accuracy, and aggregation-based encoding balances information and simplicity.
The study offers a valuable resource and framework for understanding and applying predictive process monitoring techniques in practice.

A Review and Benchmark of Outcome-Oriented Predictive Process Monitoring

This paper by Teinemaa et al. explores the domain of outcome-oriented predictive process monitoring, presenting a systematic review and a benchmark of eleven methods using real-life event logs from business processes. Predictive process monitoring has garnered significant attention due to its capacity to anticipate the future states of ongoing processes using historical execution data. The authors focus on outcome prediction, addressing the limitations of previous studies that used disparate datasets, experimental settings, and evaluation metrics, hindering method comparison.

Methods and Taxonomy

The authors lay down a comprehensive taxonomy distinguishing existing methods based on two facets: trace bucketing and sequence encoding. Given the diverse nature of business process execution data (traces), effective prediction requires defining coherent mechanisms to organize (bucket) traces and transform (encode) sequence data into suitable formats for prediction. The taxonomy presents combinations of these perspectives, including single bucket and KNN bucketing, which utilize aggregation for control flow encoding, and more sophisticated methods such as clustering and state-based bucketing.

Trace Bucketing: Methods vary from single bucket approaches—offering simplicity by using one classifier throughout—to clustering-based bucketing that involves grouping trace prefixes based on execution similarity. State-based bucketing leverages process models to define decision points for predictions.
Sequence Encoding: Encodings transform sequence data attributes into feature vectors, focusing on aspects like last state and aggregation techniques, which condense trace information across time. A lossless encoding is achieved by index-based methods, though they offer less flexibility across varying trace lengths.

Benchmarking and Results

Benchmarking was conducted across nine event logs, covering 24 predictive tasks. Methods were evaluated based on prediction accuracy (mainly AUC), earliness of prediction, and computational efficiency (offline and online processing time). Results revealed XGBoost and Random Forest as top-performing classifiers in accuracy across different datasets, highlighting their ability to capture complex patterns in business process data. In terms of sequence encoding, aggregation-based methods generally offer a meaningful balance between information retention and model simplicity. Interestingly, the paper points out that multiclassifier approaches (cluster or state-based bucketing) often require careful attention to bucket composition to ensure comprehensive training data availability.

Computational Considerations

The paper extends its insight into computational efficiency, presenting gap-based filtering and categorical domain filtering as strategic ways to control execution times without significantly affecting prediction accuracy. Such techniques help balance the intricacies of predictive monitoring where operational demands necessitate responsiveness.

Implications and Directions

This paper provides key implications for predictive process monitoring, serving as a resource for understanding and comparing methods for practical application in business process management. The findings encourage considering aggregation encoding over lossless methods for enhanced prediction reliability across heterogeneous datasets. Moreover, the taxonomy lays groundwork for speculating advancements involving deep learning (e.g., LSTM networks) in managing sequence data.

The open-source framework developed as part of this research creates opportunities for researchers to further explore, extend, and adapt AI strategies in predictive monitoring, making the pursuit of real-time process insights more accessible. Overall, as predictive monitoring evolves, this paper serves as a significant step toward structured understanding and application of outcome-oriented predictive techniques in AI.