Towards Optimizing Human-Centric Objectives in AI-Assisted Decision-Making With Offline Reinforcement Learning
The paper "Towards Optimizing Human-Centric Objectives in AI-Assisted Decision-Making With Offline Reinforcement Learning" addresses a significant gap in the current design paradigm of AI decision support systems by emphasizing the need to optimize human-centric objectives beyond mere decision accuracy. This work introduces offline reinforcement learning (RL) as a viable, customizable approach to model human-AI decision-making processes in a way that can adapt to various human-centric objectives and contextual factors.
Methodological Approach
The authors adopted a structured approach to instantiate their proposed method. Targeting both immediate decision accuracy and human learning as the critical objectives, they employed a Markov Decision Process (MDP) framework. The state space was defined to encompass individual differences in need for cognition (NFC), along with relevant contextual factors such as the AI's uncertainty and the decision-maker's task knowledge. The action space included various forms of AI assistance: no assistance, explanation only, recommendation and explanation (SXAI), and on-demand assistance. The reward structure differentiated between immediate accuracy (dense reward) and learning (sparse reward).
The experimental methodology included a data collection paper leveraging an exploratory policy and subsequent offline learning of optimal policies through Q-learning. This choice enabled deriving decision-support policies without real-time interaction risks, particularly beneficial for sensitive applications like clinical settings.
Key Findings
Computational Insights
The computational evaluation demonstrated that the RL-based policies for optimizing accuracy and learning differed significantly from the fixed SXAI policy. Optimal policies for learning, specifically, favored interactions known to induce cognitive engagement, notably for individuals low in NFC. This insight aligns with the hypothesis that people less inclined toward cognitive effort can benefit from specially designed interventions that promote engagement.
The effectiveness of the RL policies was validated through two user studies, reinforcing the notion that adaptive, context-aware AI support can lead to superior outcomes compared to static assistance models. Notably, individuals interacting with the accuracy-optimized policy achieved significantly better decision accuracy than those using baseline policies, confirming the strength of the RL approach for this objective. Additionally, policies optimized for learning showed mixed results, indicating that while the approach holds promise, the complexity of designing interactions to foster learning requires further investigation.
Objective vs. Subjective Experience
Interestingly, the paper found no inherent trade-off between learning and subjective task enjoyment, particularly for individuals low in NFC, where cognitive engagement positively correlated with task enjoyment and perceived learning. This challenges previous assumptions that enhanced cognitive engagement might reduce subjective satisfaction, highlighting that well-designed interaction models can simultaneously enhance user experience and achieve pedagogical objectives.
Implications and Future Directions
The paper's contributions provide a robust foundation for future research aimed at refining AI decision support systems to better serve human-centric goals. The use of offline RL to model human-AI decision dynamics introduces a powerful toolkit for developing interaction policies that could adaptively enhance both operational performance and user satisfaction.
The findings underscore the necessity of extending the research to explore other human-centric objectives beyond accuracy and learning, such as promoting long-term user engagement or improving collaborative efficiency in team settings. Furthermore, there is a clear need for developing and empirically validating new forms of AI explanations and interactions that can reliably enhance learning across diverse user populations.
Finally, while this research focused on a non-critical domain (exercise prescription for laypeople), the methodology and results have broader applicability. Extending this work to high-stakes environments, such as healthcare, can offer substantial benefits. Embedding RL-based adaptive support into clinical decision support systems holds the potential to significantly improve patient outcomes by optimizing both decision accuracy and clinicians' learning, ultimately fostering a more effective and expert workforce.
Conclusion
This paper makes a substantial contribution to the field of AI-assisted decision-making by showcasing the potential of offline RL to achieve human-centric objectives. The findings advocate for a nuanced, dynamic approach to AI support that considers individual differences and specific context factors, paving the way for more effective and engaging human-AI collaborations. As AI continues to integrate into various aspects of decision-making, this research provides valuable insights and a practical framework for optimizing human-centric outcomes.