Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Program-Based Strategy Induction for Reinforcement Learning (2402.16668v1)

Published 26 Feb 2024 in cs.LG and cs.AI

Abstract: Typical models of learning assume incremental estimation of continuously-varying decision variables like expected rewards. However, this class of models fails to capture more idiosyncratic, discrete heuristics and strategies that people and animals appear to exhibit. Despite recent advances in strategy discovery using tools like recurrent networks that generalize the classic models, the resulting strategies are often onerous to interpret, making connections to cognition difficult to establish. We use Bayesian program induction to discover strategies implemented by programs, letting the simplicity of strategies trade off against their effectiveness. Focusing on bandit tasks, we find strategies that are difficult or unexpected with classical incremental learning, like asymmetric learning from rewarded and unrewarded trials, adaptive horizon-dependent random exploration, and discrete state switching.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. (2015). Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task. PLOS Computational Biology, 11(12), 1–25.
  2. (2022). Modeling human exploration through resource-rational reinforcement learning. In Advances in Neural Information Processing Systems (Vol. 35).
  3. (2017). Reminders of past choices bias decisions for reward in humans. Nature Communications, 8(1), 15958.
  4. (2020). Beyond dichotomies in reinforcement learning. Nature Reviews Neuroscience, 21(10), 576–586.
  5. (2012). How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. European Journal of Neuroscience, 35(7), 1024–1035.
  6. (2012). Habits, action sequences and reinforcement learning. European Journal of Neuroscience, 35(7), 1036–1051.
  7. (2019). Models that learn how humans learn: The case of decision-making and its disorders. PLOS Computational Biology, 15(6), 1–33.
  8. (2016). RL22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779.
  9. (2016). Memory states influence value-based decisions. Journal of Experimental Psychology: General, 145(11), 1420–1426.
  10. (2020). Organizing recurrent network dynamics by task-computation to enable continual learning. In Advances in Neural Information Processing Systems (Vol. 33).
  11. (2018). Exploration disrupts choice-predictive signals and alters dynamics in prefrontal cortex. Neuron, 97(2), 450–461.e9.
  12. (2022). Synthesizing theories of human language with Bayesian program induction. Nature Communications, 13(1), 5024.
  13. (2021). Dreamcoder: Bootstrapping inductive program synthesis with wake-sleep library learning. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (pp. 835–850).
  14. (2019). Computational noise in reward-guided learning drives behavioral variability in volatile environments. Nature Neuroscience, 22(12), 2066–2077.
  15. (2008). A rational analysis of rule-based concept learning. Cognitive Science, 32(1), 108–154.
  16. (2018). An effect of serotonergic stimulation on learning rates for rewards apparent after long intertrial intervals. Nature Communications, 9(1), 2477.
  17. (2023). Automatic discovery of cognitive strategies with tiny recurrent neural networks. bioRxiv preprint 2023.04.12.536629.
  18. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1), 99–134.
  19. (2021). Policy compression: An information bottleneck in action selection. In K. D. Federmeier (Ed.), The psychology of learning and motivation (Vol. 74, pp. 195–232). Academic Press.
  20. (2005). Dynamic response-by-response models of matching behavior in rhesus monkeys. Journal of the Experimental Analysis of Behavior, 84(3), 555–579.
  21. (2022). A normative account of confirmation bias during reinforcement learning. Neural Computation, 34(2), 307–337.
  22. Levine, S.  (2018). Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909.
  23. (2020). Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences, 43, e1.
  24. Marr, D.  (1982). Vision. W. H. Freeman.
  25. (2023). Cognitive model discovery via disentangled RNNs. bioRxiv preprint 2023.06.23.546250.
  26. (2023). A unified theory of dual-process control. arXiv preprint arXiv:2211.07036.
  27. (2019). Meta-learning of sequential strategies. arXiv preprint arXiv:1905.03030.
  28. Palminteri, S.  (2023). Choice-confirmation bias and gradual perseveration in human reinforcement learning. Behavioral Neuroscience, 137(1), 78–88.
  29. (2016). Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nature Neuroscience, 19(6), 845–854.
  30. (2021). Linear reinforcement learning in planning, grid fields, and cognitive control. Nature Communications, 12(1), 4942.
  31. (2015). Reliance on small samples, the wavy recency effect, and similarity-based learning. Psychological Review, 122(4), 621–647.
  32. (2020). AutoML-zero: Evolving machine learning algorithms from scratch. In H. D. III  A. Singh (Eds.), Proceedings of the 37th International Conference on Machine Learning (Vol. 119, pp. 8007–8019).
  33. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black  W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Appleton-Century-Crofts.
  34. Robbins, H.  (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5), 527–535.
  35. (2018). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
  36. (2006). Probabilistic inference for solving discrete and continuous state Markov decision processes. In Proceedings of the 23rd International Conference on Machine Learning (pp. 945–952).
  37. (2018). Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience, 21(6), 860–868.
  38. (2014). Humans use directed and random exploration to solve the explore–exploit dilemma. Journal of Experimental Psychology: General, 143(6), 2074–2081.
  39. (2011). Bayesian policy search with policy priors. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Two (pp. 1565–1570).
  40. (2022). One model for the learning of language. Proceedings of the National Academy of Sciences, 119(5), e2021865119.
Citations (1)

Summary

We haven't generated a summary for this paper yet.