The Role of Explanation Styles and Perceived Accuracy on Decision Making in Predictive Process Monitoring (2506.16617v1)

Published 19 Jun 2025 in cs.AI and cs.HC

Abstract: Predictive Process Monitoring (PPM) often uses deep learning models to predict the future behavior of ongoing processes, such as predicting process outcomes. While these models achieve high accuracy, their lack of interpretability undermines user trust and adoption. Explainable AI (XAI) aims to address this challenge by providing the reasoning behind the predictions. However, current evaluations of XAI in PPM focus primarily on functional metrics (such as fidelity), overlooking user-centered aspects such as their effect on task performance and decision-making. This study investigates the effects of explanation styles (feature importance, rule-based, and counterfactual) and perceived AI accuracy (low or high) on decision-making in PPM. We conducted a decision-making experiment, where users were presented with the AI predictions, perceived accuracy levels, and explanations of different styles. Users' decisions were measured both before and after receiving explanations, allowing the assessment of objective metrics (Task Performance and Agreement) and subjective metrics (Decision Confidence). Our findings show that perceived accuracy and explanation style have a significant effect.

Authors (5)

Soobin Chae (1 paper)
Suhwan Lee (4 papers)
Hanna Hauptmann (2 papers)
Hajo A. Reijers (20 papers)
Xixi Lu (14 papers)

Summary

Empirical Analysis of Explanation Styles and Perceived Accuracy in Predictive Process Monitoring

This paper systematically investigates how explanation styles and perceived AI accuracy impact user decision-making in Predictive Process Monitoring (PPM). The paper addresses critical user-centric evaluation gaps in the PPM-XAI literature, shifting the emphasis from traditional fidelity or stability metrics to the actual effect of explanations on user task performance, agreement with AI predictions, and confidence.

Experimental Framework and Methodology

A two-factor between-subjects design was employed:

Perceived AI accuracy was manipulated by presenting participants with either high (96%) or low (63%) accuracy claims about the AI model, while the underlying model remained constant.
Explanation style was varied across three canonical XAI approaches:
- Feature Importance (FI) via LIME,
- Rule-based through Anchor,
- Counterfactual using DiCE.

Participants (N=179) executed loan approval tasks on real-world event log data (BPIC 2017), with pre- and post-explanation decision collection, enabling within- and between-group comparisons. Task performance (correctness), agreement (concurrence with the AI), and self-rated confidence furnished a comprehensive dataset for quantitative analysis.

Principal Findings

Key results introduce several notable, and at times counter-intuitive, empirical insights:

Perceived Accuracy Effects:

Lower perceived model accuracy led to higher initial task performance (mean diff. = -0.2558, p=0.026, Cohen's d=0.34), contradicting common assumptions that high confidence in AI generally improves user outcomes. No significant effect was found on agreement or confidence pre-explanation.

Explanation Styles and Task Performance:

Counterfactual explanations significantly improved task performance post-explanation, particularly in the low perceived accuracy group (mean improvement = 0.45 out of 4, p=0.002, Cohen's d=0.63). Rule-based and FI explanations yielded only modest or statistically insignificant gains.

Agreement with AI:

Feature Importance explanations reduced agreement with AI in the low-accuracy group (mean diff. = -0.26, p=0.043), suggesting FI explanations may encourage critical engagement; Counterfactual and Rule-based explanations did not significantly impact agreement.

Decision Confidence:

No explanation style led to a statistically significant change in self-reported decision confidence either globally or split by perceived accuracy group.

User Satisfaction and Simplicity:

Despite their objective effectiveness, counterfactual explanations scored lowest for satisfaction and perceived simplicity, whereas rule-based explanations were rated highest for both.

Interpretation and Theoretical Implications

The observed negative relationship between perceived AI accuracy and task performance highlights a potential risk of overreliance: users assigned to the high-accuracy condition frequently agreed with AI even in error, reducing correct decisions. This phenomenon underlines a known hazard in XAI-supported decision-making—automation bias—and demonstrates the necessity of fostering calibrated skepticism among end-users.

The superiority of counterfactual explanations in enhancing performance, especially when users perceived the AI as less accurate, corroborates psychological and HCI findings on the power of contrastive reasoning in complex decision contexts. However, their lower user satisfaction underscores the importance of explanation comprehensibility for effective adoption.

Rule-based explanations, while preferred for subjective metrics, did not produce commensurate objective improvements, aligning with prior suggestions that perceived usability of an explanation does not necessarily guarantee improved decision-making.

Practical and Future Directions

The paper's results prompt several recommendations for both the deployment and ongoing research of XAI in PPM:

Contextual Calibration of AI Trust: Exposing users to honest reporting of model confidence, or carefully manipulating perceived accuracy, may encourage more vigilant and accurate user engagement.
Differentiated Explanation Offerings: Providing multiple explanation modalities, possibly with user-adaptive explanation selection, could balance objective task improvement and user satisfaction.
Counterfactual Explanations: These should be systematically integrated into PPM interfaces, particularly in high-uncertainty high-stakes environments, though effort should be invested in improving their accessibility.

Research directions include:

Application-grounded evaluation in professional or expert user cohorts to further validate findings from crowd-sourced or student populations.
Development and assessment of new explanation paradigms, such as natural language explanations generated by LLMs, aiming to merge high objective effectiveness (cf. counterfactuals) with high subjective preference (cf. rule-based).
Deeper investigation into the interplay between real and perceived accuracy, explanation uptake, and long-term learning effects in decision support systems.

Conclusion

This work advances the empirical paper of user-centered XAI in PPM by rigorously disentangling explanation effectiveness from mere understandability and by empirically demonstrating that explanation style and perceived model accuracy can have significant, sometimes surprising, effects on real-world decision-making outcomes. The findings advocate for a nuanced, context-aware design of XAI interfaces, with careful consideration of explanatory modality and model transparency. The public release of materials and code further supports the reproducibility and extensibility of this line of research.

PDF Markdown

Related Papers

YouTube

Show All Videos