Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reward Design for Justifiable Sequential Decision-Making (2402.15826v1)

Published 24 Feb 2024 in cs.LG and cs.AI

Abstract: Equipping agents with the capacity to justify made decisions using supporting evidence represents a cornerstone of accountable decision-making. Furthermore, ensuring that justifications are in line with human expectations and societal norms is vital, especially in high-stakes situations such as healthcare. In this work, we propose the use of a debate-based reward model for reinforcement learning agents, where the outcome of a zero-sum debate game quantifies the justifiability of a decision in a particular state. This reward model is then used to train a justifiable policy, whose decisions can be more easily corroborated with supporting evidence. In the debate game, two argumentative agents take turns providing supporting evidence for two competing decisions. Given the proposed evidence, a proxy of a human judge evaluates which decision is better justified. We demonstrate the potential of our approach in learning policies for prescribing and justifying treatment decisions of septic patients. We show that augmenting the reward with the feedback signal generated by the debate-based reward model yields policies highly favored by the judge when compared to the policy obtained solely from the environment rewards, while hardly sacrificing any performance. Moreover, in terms of the overall performance and justifiability of trained policies, the debate-based feedback is comparable to the feedback obtained from an ideal judge proxy that evaluates decisions using the full information encoded in the state. This suggests that the debate game outputs key information contained in states that is most relevant for evaluating decisions, which in turn substantiates the practicality of combining our approach with human-in-the-loop evaluations. Lastly, we showcase that agents trained via multi-agent debate learn to propose evidence that is resilient to refutations and closely aligns with human preferences.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Belief bias in the perception of sample size adequacy. Thinking & Reasoning, 20(3):297–314, 2014.
  2. Learning to understand goal specifications by modelling reward. arXiv preprint arXiv:1806.01946, 2018.
  3. Verifiable reinforcement learning via policy extraction. Advances in Neural Information Processing Systems, 31, 2018.
  4. Learning" what-if" explanations for sequential decision-making. arXiv preprint arXiv:2007.13531, 2020.
  5. Batch active preference-based learning of reward functions. In Conference on robot learning, pp.  519–528. PMLR, 2018.
  6. Mark Bovens. Analysing and assessing accountability: A conceptual framework 1. European law journal, 13(4):447–468, 2007.
  7. Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
  8. Toward trustworthy ai development: mechanisms for supporting verifiable claims. arXiv preprint arXiv:2004.07213, 2020.
  9. Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30, 2017.
  10. Sepsis: a roadmap for future research. Lancet Infect Dis, 15(5):581–614, April 2015.
  11. Surviving sepsis campaign guidelines for management of severe sepsis and septic shock. Crit Care Med, 32(3):858–873, March 2004.
  12. Explicable reward design for reinforcement learning agents. Advances in Neural Information Processing Systems, 34:20118–20131, 2021.
  13. Improving factuality and reasoning in language models through multiagent debate. arXiv preprint arXiv:2305.14325, 2023.
  14. Jonathan St BT Evans and Jodie Curtis-Holmes. Rapid responding increases belief bias: Evidence for the dual-process theory of reasoning. Thinking & Reasoning, 11(4):382–389, 2005.
  15. A q-learning approach for adherence-aware recommendations. arXiv preprint arXiv:2309.06519, 2023.
  16. The best decisions are not the best advice: Making adherence-aware recommendations. arXiv preprint arXiv:2209.01874, 2022.
  17. Gillian K Hadfield. Explanation and justification: Ai decision-making, law, and the rights of citizens. University of Toronto/Schwartz Reisman Institute for Technology and Society, 18, 2021.
  18. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pp.  1026–1034, 2015.
  19. Reasoning on knowledge graphs with debate dynamics. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  4123–4131, 2020.
  20. Reinforcement learning for sepsis treatment: A continuous action space solution. In Machine Learning for Healthcare Conference, pp.  631–647. PMLR, 2022.
  21. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pp. 448–456. pmlr, 2015.
  22. Ai safety via debate. arXiv preprint arXiv:1805.00899, 2018.
  23. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
  24. A composable specification language for reinforcement learning tasks. Advances in Neural Information Processing Systems, 32, 2019.
  25. Explainable reinforcement learning via reward decomposition, ijcai. In Proceedings at the International Joint Conference on Artificial Intelligence. A Workshop on Explainable Artificial Intelligence., 2019.
  26. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nature medicine, 24(11):1716–1720, 2018.
  27. Reward design with language models. arXiv preprint arXiv:2303.00001, 2023.
  28. Scalable agent alignment via reward modeling: A research direction. arxiv 2018. arXiv preprint arXiv:1811.07871, 2018.
  29. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 2017.
  30. Policy poisoning in batch reinforcement learning and control. Advances in Neural Information Processing Systems, 32, 2019.
  31. Rectifier nonlinearities improve neural network acoustic models. In International Conference on Machine Learning, volume 30, pp.  3. Atlanta, GA, 2013.
  32. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  33. Efficient saliency maps for explainable ai. arXiv preprint arXiv:1911.11293, 2019.
  34. Finding generalizable evidence by learning to convince q&a models. arXiv preprint arXiv:1909.05863, 2019.
  35. Deep reinforcement learning for sepsis treatment. arXiv preprint arXiv:1711.09602, 2017.
  36. Protox: Explaining a reinforcement learning agent via prototyping. Advances in Neural Information Processing Systems, 35:27239–27252, 2022.
  37. "why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp.  1135–1144, 2016.
  38. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
  39. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  40. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pp.  618–626, 2017.
  41. Contextual games: Multi-agent learning with side information. Advances in Neural Information Processing Systems, 33:21912–21922, 2020.
  42. Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge University Press, 2008.
  43. The third international consensus definitions for sepsis and septic shock (sepsis-3). Jama, 315(8):801–810, 2016.
  44. Reinforcement learning: An introduction. MIT press, 2018.
  45. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, pp.  2094––2100, 2016.
  46. J. von Neumann and O. Morgenstern. Theory of Games and Economic Behavior. Princeton University Press, 1947.
  47. Dueling network architectures for deep reinforcement learning. In International Conference on Machine Learning, pp. 1995–2003. PMLR, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Aleksa Sukovic (3 papers)
  2. Goran Radanovic (33 papers)