Improving Human Sequential Decision-Making with Reinforcement Learning (2108.08454v5)
Abstract: Workers spend a significant amount of time learning how to make good decisions. Evaluating the efficacy of a given decision, however, can be complicated -- e.g., decision outcomes are often long-term and relate to the original decision in complex ways. Surprisingly, even though learning good decision-making strategies is difficult, they can often be expressed in simple and concise forms. Focusing on sequential decision-making, we design a novel machine learning algorithm that is capable of extracting "best practices" from trace data and conveying its insights to humans in the form of interpretable "tips". Our algorithm selects the tip that best bridges the gap between the actions taken by human workers and those taken by the optimal policy in a way that accounts for which actions are consequential for achieving higher performance. We evaluate our approach through a series of randomized controlled experiments where participants manage a virtual kitchen. Our experiments show that the tips generated by our algorithm can significantly improve human performance relative to intuitive baselines. In addition, we discuss a number of empirical insights that can help inform the design of algorithms intended for human-AI interfaces. For instance, we find evidence that participants do not simply blindly follow our tips; instead, they combine them with their own experience to discover additional strategies for improving performance.
- Allcott H (2011) Social norms and energy conservation. Journal of public Economics 95(9-10):1082–1095.
- Argote L (2012) Organizational learning: Creating, retaining and transferring knowledge (Springer Science & Business Media).
- Bavafa H, Jónasson JO (2021) Recovering from critical incidents: Evidence from paramedic performance. Manufacturing & Service Operations Management 23(4):914–932.
- Bertsimas D, Dunn J (2017) Optimal classification trees. Machine Learning 106(7):1039–1082.
- Breiman L (2001) Random forests. Machine learning 45(1):5–32.
- Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 .
- Fudenberg D, Liang A (2019) Predicting and understanding initial play. American Economic Review 109(12):4112–41.
- Giuffrida A, Torgerson DJ (1997) Should we pay the patient? review of financial incentives to enhance patient compliance. Bmj 315(7110):703–707.
- Gleicher M (2016) A framework for considering comprehensibility in modeling. Big data 4(2):75–88.
- Green B, Chen Y (2019) The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction 3(CSCW):1–24.
- Huckman RS, Pisano GP (2006) The firm specificity of individual performance: Evidence from cardiac surgery. Management Science 52(4):473–488.
- Kc DS, Staats BR (2012) Accumulating a portfolio of experience: The effect of focal and related experience on surgeon performance. Manufacturing & Service Operations Management 14(4):618–633.
- Kneusel RT, Mozer MC (2017) Improving human-machine cooperative visual search with soft highlighting. ACM Transactions on Applied Perception (TAP) 15(1):1–21.
- Marshall A (2020) Uber changes its rules, and drivers adjust their strategies. URL https://www.wired.com/story/uber-changes-rules-drivers-adjust-strategies/.
- Nonaka I, Takeuchi H (1995) The knowledge-creating company: How Japanese companies create the dynamics of innovation (Oxford university press).
- Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1(5):206–215.
- Spear SJ (2005) Fixing health care from the inside, today. Harvard business review 83(9):78.
- Sull DN, Eisenhardt KM (2015) Simple rules: How to thrive in a complex world (Houghton Mifflin Harcourt).
- Sutton RS, Barto AG (2018) Reinforcement learning: An introduction (MIT press).
- Szulanski G (1996) Exploring internal stickiness: Impediments to the transfer of best practice within the firm. Strategic management journal 17(S2):27–43.
- Watkins CJ, Dayan P (1992) Q-learning. Machine learning 8(3-4):279–292.