Learning Optimal Behavior Through Reasoning and Experiences (2403.18185v1)
Abstract: We develop a novel framework of bounded rationality under cognitive frictions that studies learning over optimal behavior through both deliberative reasoning and accumulated experiences. Using both types of information, agents engage in Bayesian non-parametric estimation of the unknown action value function. Reasoning signals are produced internally through mental deliberation, subject to a cognitive cost. Experience signals are the observed utility outcomes at previous actions. Agents' subjective estimation uncertainty, which evolves through information accumulation, modulates the two modes of learning in a state- and history-dependent way. We discuss how the model draws on and bridges conceptual, methodological and empirical insights from both economics and the cognitive sciences literature on reinforcement learning.
- Ackley, David H, Geoffrey E Hinton, and Terrence J Sejnowski, “A learning algorithm for Boltzmann machines,” Cognitive science, 1985, 9 (1), 147–169.
- Aiyagari, S Rao, “Uninsured idiosyncratic risk and aggregate saving,” The Quarterly Journal of Economics, 1994, 109 (3), 659–684.
- Akerlof, George A and William T Dickens, “The economic consequences of cognitive dissonance,” The American economic review, 1982, 72 (3), 307–319.
- Alaoui, Larbi and Antonio Penta, “Cost-benefit analysis in reasoning,” Journal of Political Economy, 2022, 130 (4), 881–925.
- Aragones, Enriqueta, Itzhak Gilboa, Andrew Postlewaite, and David Schmeidler, “Fact-Free Learning,” The American Economic Review, 2005, 95, pp–1355.
- Aronson, Elliot, “The theory of cognitive dissonance: A current perspective,” in “Advances in experimental social psychology,” Vol. 4, Elsevier, 1969, pp. 1–34.
- Barberis, Nicholas C and Lawrence J Jin, “Model-free and model-based learning as joint drivers of investor behavior,” 2023. NBER Working Paper 31081.
- Bardhi, Arjada, “Optimal discovery and influence through selective sampling,” 2022. Duke University, mimeo.
- Botvinick, Matthew M and Jonathan D Cohen, “The computational and neural basis of cognitive control: charted territory and new frontiers,” Cognitive science, 2014, 38 (6), 1249–1285.
- , , and Cameron S Carter, “Conflict monitoring and anterior cingulate cortex: an update,” Trends in cognitive sciences, 2004, 8 (12), 539–546.
- Callander, Steven, “Searching and learning by trial and error,” American Economic Review, 2011, 101 (6), 2277–2308.
- Chater, Nick, Joshua B Tenenbaum, and Alan Yuille, “Probabilistic models of cognition: Conceptual foundations,” Trends in cognitive sciences, 2006, 10 (7), 287–291.
- Daw, Nathaniel D, Yael Niv, and Peter Dayan, “Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control,” Nature neuroscience, 2005, 8 (12), 1704–1711.
- Dearden, Richard, Nir Friedman, and Stuart Russell, “Bayesian Q-learning,” Aaai/iaai, 1998, 1998, 761–768.
- Dew-Becker, Ian and Charles G Nathanson, “Directed attention and nonparametric learning,” Journal of Economic Theory, 2019, 181, 461–496.
- Engel, Yaakov, Shie Mannor, and Ron Meir, “Bayes meets Bellman: The Gaussian process approach to temporal difference learning,” in “Proceedings of the 20th International Conference on Machine Learning (ICML-03)” 2003, pp. 154–161.
- , , and , “Reinforcement learning with Gaussian processes,” in “Proceedings of the 22nd international conference on Machine learning” 2005, pp. 201–208.
- Eysenbach, Benjamin and Sergey Levine, “If MaxEnt RL is the answer, what is the question?,” arXiv preprint arXiv:1910.01913, 2019.
- Gabaix, Xavier, “A sparsity-based model of bounded rationality,” The Quarterly Journal of Economics, 2014, 129 (4), 1661–1710.
- Gershman, Samuel J, Eric J Horvitz, and Joshua B Tenenbaum, “Computational rationality: A converging paradigm for intelligence in brains, minds, and machines,” Science, 2015, 349 (6245), 273–278.
- Ghavamzadeh, Mohammad, Shie Mannor, Joelle Pineau, Aviv Tamar et al., “Bayesian reinforcement learning: A survey,” Foundations and Trends® in Machine Learning, 2015, 8 (5-6), 359–483.
- Griffiths, Thomas L, Charles Kemp, and Joshua B Tenenbaum, “Bayesian models of cognition.,” in “Annual Meeting of the Cognitive Science Society, 2004; This chapter is based in part on tutorials given by the authors at the aforementioned conference as well as the one held in 2006.” Cambridge University Press 2008.
- , Falk Lieder, and Noah D Goodman, “Rational use of cognitive resources: Levels of analysis between the computational and the algorithmic,” Topics in cognitive science, 2015, 7 (2), 217–229.
- , Nick Chater, Charles Kemp, Amy Perfors, and Joshua B Tenenbaum, “Probabilistic models of cognition: Exploring representations and inductive biases,” Trends in cognitive sciences, 2010, 14 (8), 357–364.
- Haarnoja, Tuomas, Haoran Tang, Pieter Abbeel, and Sergey Levine, “Reinforcement learning with deep energy-based policies,” in “International conference on machine learning” PMLR 2017, pp. 1352–1361.
- Ilut, Cosmin and Rosen Valchev, “Economic agents as imperfect problem solvers,” The Quarterly Journal of Economics, 2023, 138 (1), 313–362.
- , , and Nicolas Vincent, “Paralyzed by Fear: Rigid and Discrete Pricing under Demand Uncertainty,” Econometrica, 2020, 88 (5), 1899–1938.
- Judd, Kenneth L et al., “Numerical Methods in Economics,” MIT Press Books, 1998, 1.
- Kaelbling, Leslie Pack, Michael L Littman, and Andrew W Moore, “Reinforcement learning: A survey,” Journal of artificial intelligence research, 1996, 4, 237–285.
- Kirkpatrick, Scott, C Daniel Gelatt Jr, and Mario P Vecchi, “Optimization by simulated annealing,” science, 1983, 220 (4598), 671–680.
- Kool, Wouter, Samuel J Gershman, and Fiery A Cushman, “Cost-benefit arbitration between multiple reinforcement-learning systems,” Psychological science, 2017, 28 (9), 1321–1333.
- Lai, Lucy and Samuel J Gershman, “Policy compression: An information bottleneck in action selection,” in “Psychology of Learning and Motivation,” Vol. 74, Elsevier, 2021, pp. 195–232.
- Lee, Sang Wan, Shinsuke Shimojo, and John P O’Doherty, “Neural computations underlying arbitration between model-based and model-free learning,” Neuron, 2014, 81 (3), 687–699.
- Lian, Chen, “Mistakes in future consumption, high MPCs now,” American Economic Review: Insights, 2023, 5 (4), 563–581.
- Lin, Junyang, Xu Sun, Xuancheng Ren, Muyu Li, and Qi Su, “Learning when to concentrate or divert attention: Self-adaptive attention temperature for neural machine translation,” arXiv preprint arXiv:1808.07374, 2018.
- Malmendier, Ulrike, “Experience effects in finance: Foundations, applications, and future directions,” Review of Finance, 2021, 25 (5), 1339–1363.
- and Stefan Nagel, “Learning from inflation experiences,” The Quarterly Journal of Economics, 2016, 131 (1), 53–87.
- Matějka, Filip and Alisdair McKay, “Rational inattention to discrete choices: A new foundation for the multinomial logit model,” The American Economic Review, 2014, 105 (1), 272–298.
- McFadden, Daniel, “Conditional Logit Analysis of Qualitative Choice Behavior,” in “Frontiers in Econometrics,” Academic Press, 1974.
- Mnih, Volodymyr, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in “International conference on machine learning” PMLR 2016, pp. 1928–1937.
- Moerland, Thomas M, Joost Broekens, Aske Plaat, Catholijn M Jonker et al., “Model-based reinforcement learning: A survey,” Foundations and Trends® in Machine Learning, 2023, 16 (1), 1–118.
- Schulz, Eric, Emmanouil Konstantinidis, and Maarten Speekenbrink, “Putting bandits into context: How function learning supports decision making.,” Journal of experimental psychology: learning, memory, and cognition, 2018, 44 (6), 927.
- Shenhav, Amitai, Sebastian Musslick, Falk Lieder, Wouter Kool, Thomas L Griffiths, Jonathan D Cohen, and Matthew M Botvinick, “Toward a rational and mechanistic account of mental effort,” Annual review of neuroscience, 2017, 40, 99–124.
- Sims, Christopher A, “Implications of rational inattention,” Journal of Monetary Economics, 2003, 50 (3), 665–690.
- , “Implications of rational inattention,” Journal of Monetary Economics, 2003, 50 (3), 665–690.
- Sutton, Richard S, “Learning to predict by the methods of temporal differences,” Machine learning, 1988, 3, 9–44.
- , “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,” in “Machine learning proceedings 1990,” Elsevier, 1990, pp. 216–224.
- , “Dyna, an integrated architecture for learning, planning, and reacting,” ACM Sigart Bulletin, 1991, 2 (4), 160–163.
- Thompson, Valerie A, Jamie A Prowse Turner, and Gordon Pennycook, “Intuition, reason, and metacognition,” Cognitive psychology, 2011, 63 (3), 107–140.
- Wang, Yufei and Tianwei Ni, “Meta-sac: Auto-tune the entropy temperature of soft actor-critic via metagradient,” arXiv preprint arXiv:2007.01932, 2020.
- Woodford, Michael, “Modeling imprecision in perception, valuation, and choice,” Annual Review of Economics, 2020, 12, 579–601.
- Wu, Charley M, Eric Schulz, and Samuel J Gershman, “Inference and search on graph-structured spaces,” Computational Brain & Behavior, 2021, 4, 125–147.
- , , Maarten Speekenbrink, Jonathan D Nelson, and Björn Meder, “Generalization guides human exploration in vast decision spaces,” Nature human behaviour, 2018, 2 (12), 915–924.