Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

When in Doubt, Think Slow: Iterative Reasoning with Latent Imagination (2402.15283v1)

Published 23 Feb 2024 in cs.LG and cs.AI

Abstract: In an unfamiliar setting, a model-based reinforcement learning agent can be limited by the accuracy of its world model. In this work, we present a novel, training-free approach to improving the performance of such agents separately from planning and learning. We do so by applying iterative inference at decision-time, to fine-tune the inferred agent states based on the coherence of future state representations. Our approach achieves a consistent improvement in both reconstruction accuracy and task performance when applied to visual 3D navigation tasks. We go on to show that considering more future states further improves the performance of the agent in partially-observable environments, but not in a fully-observable one. Finally, we demonstrate that agents with less training pre-evaluation benefit most from our approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Thinking fast and slow with deep learning and tree search. Advances in neural information processing systems, 30, 2017.
  2. A survey on intrinsic motivation in reinforcement learning. arXiv preprint arXiv:1908.06976, 2019.
  3. Pondernet: Learning to ponder. arXiv preprint arXiv:2107.05407, 2021.
  4. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81–138, 1995. ISSN 0004-3702.
  5. Deepmind lab, 2016.
  6. Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems, 29, 2016.
  7. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
  8. Bengio, Y. Invited talk. In Association for the Advancement of Artificial Intelligence (AAAI) Conference, 2019.
  9. Adaptive neural networks for efficient inference, 2017.
  10. Breiman, L. Bagging predictors. Machine Learning, 24:123–140, 1996. doi:10.1007/BF00058655.
  11. Chevalier-Boisvert, M. Miniworld: Minimalistic 3d environment for rl & robotics research. https://github.com/maximecb/gym-miniworld, 2018.
  12. Hierarchical multiscale recurrent neural networks, 2017.
  13. Coulom, R. Efficient selectivity and backup operators in monte-carlo tree search. In International conference on computers and games, pp.  72–83. Springer, 2006.
  14. Inference suboptimality in variational autoencoders. In International Conference on Machine Learning, pp.  1078–1086. PMLR, 2018.
  15. Universal transformers. In International Conference on Learning Representations, 2019.
  16. Learning iterative reasoning through energy minimization. In International Conference on Machine Learning, pp.  5570–5582. PMLR, 2022.
  17. Evans, J. S. B. Heuristic and analytic processes in reasoning. British Journal of Psychology, 75(4):451–468, 1984.
  18. Differentiable adaptive computation time for visual reasoning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  12817–12825, 2020.
  19. Deep active inference agents using monte-carlo methods. Advances in neural information processing systems, 33:11662–11675, 2020.
  20. Active inference and learning. Neuroscience & Biobehavioral Reviews, 68:862–879, 2016. ISSN 0149-7634.
  21. Model predictive control: Theory and practice—a survey. Automatica, 25(3):335–348, 1989.
  22. Graves, A. Adaptive computation time for recurrent neural networks. arXiv preprint arXiv:1603.08983, 2016.
  23. Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019.
  24. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  25. Provably efficient maximum entropy exploration. In International Conference on Machine Learning, pp.  2681–2691. PMLR, 2019.
  26. Darla: Improving zero-shot transfer in reinforcement learning. In International Conference on Machine Learning, pp.  1480–1490. PMLR, 2017.
  27. Iterative refinement of approximate posterior for training directed belief networks. CoRR, abs/1511.06382, 2015.
  28. Stochastic variational inference. Journal of Machine Learning Research, 2013.
  29. An introduction to variational methods for graphical models. Learning in graphical models, pp.  105–161, 1998.
  30. Kahneman, D. Thinking, Fast and Slow. Farrar, Straus and Giroux, 2011.
  31. Semi-amortized variational autoencoders. In International Conference on Machine Learning, pp.  2678–2687. PMLR, 2018.
  32. Auto-encoding variational bayes, 2013.
  33. On the challenges of learning with inference networks on sparse, high-dimensional data. In International conference on artificial intelligence and statistics, pp.  143–151. PMLR, 2018.
  34. Swiftsage: A generative agent with fast and slow thinking for complex interactive tasks. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  35. Lindley, D. V. On a measure of the information provided by an experiment. The Annals of Mathematical Statistics, 27(4):986–1005, 1956. ISSN 00034851.
  36. MacKay, D. J. C. Information-based objective functions for active data selection. Neural Computation, 4(4):590–604, 07 1992. ISSN 0899-7667. doi:10.1162/neco.1992.4.4.590.
  37. Iterative inference models. In Second workshop on Bayesian Deep Learning, NIPS, 2017.
  38. Iterative amortized inference. In Proceedings of the 35th International Conference on Machine Learning, 2018.
  39. Iterative amortized policy optimization. Advances in Neural Information Processing Systems, 34:15667–15681, 2021.
  40. Reinforcement learning as iterative and amortised inference. arXiv preprint arXiv:2006.10524, 2020a.
  41. On the relationship between active inference and control as inference. In Active Inference: First International Workshop, IWAI 2020, Co-located with ECML/PKDD 2020, Ghent, Belgium, September 14, 2020, Proceedings 1, pp.  3–11. Springer, 2020b.
  42. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023.
  43. A view of the em algorithm that justifies incremental, sparse, and other variants. In Learning in graphical models, pp.  355–368. Springer, 1998.
  44. Improving coherence and consistency in neural sequence models with dual-system, neuro-symbolic reasoning. Advances in Neural Information Processing Systems, 34:25192–25204, 2021.
  45. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature neuroscience, 2(1):79–87, 1999.
  46. Stochastic backpropagation and approximate inference in deep generative models. In Xing, E. P. and Jebara, T. (eds.), Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pp.  1278–1286, Bejing, China, 22–24 Jun 2014. PMLR.
  47. Information is power: Intrinsic control via information capture. Advances in Neural Information Processing Systems, 34:10745–10758, 2021.
  48. Schmidhuber, J. Making the world differentiable: on using self supervised fully recurrent neural networks for dynamic reinforcement learning and planning in non-stationary environments, volume 126. Inst. für Informatik, 1990.
  49. Schmidhuber, J. Curious model-building control systems. In IEEE International Joint Conference on Neural Networks, volume 2, pp.  1458–1463, 1991.
  50. Can you learn an algorithm? generalizing from easy to hard problems with recurrent networks. Advances in Neural Information Processing Systems, 34:6695–6706, 2021.
  51. Planning to explore via self-supervised world models. In International Conference on Machine Learning, pp.  8583–8592. PMLR, 2020.
  52. Settles, B. Active learning literature survey. University of Wisconsin, Madison, 52, 07 2010.
  53. Shannon, C. E. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379–423, 1948.
  54. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
  55. Sutton, R. S. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin, 2(4):160–163, 1991.
  56. Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018.
  57. Solving olympiad geometry without human demonstrations. Nature, 625(7995):476–482, 2024.
  58. Control as hybrid inference. arXiv preprint arXiv:2007.05838, 2020.
  59. Hybrid predictive coding: Inferring, fast and slow. PLoS Computational Biology, 19(8):e1011280, 2023.
  60. Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  61. Mean squared error: Love it or leave it? a new look at signal fidelity measures. IEEE Signal Processing Magazine, 26(1):98–117, 2009. doi:10.1109/MSP.2008.930649.
  62. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004. doi:10.1109/TIP.2003.819861.
  63. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  64. Sample as you infer: Predictive coding with langevin dynamics. arXiv preprint arXiv:2311.13664, 2023.
  65. Episodic memory for subjective-timescale models. In ICML 2021 Workshop on Unsupervised Reinforcement Learning, 2021.
  66. Variational predictive routing with nested subjective timescales. In International Conference on Learning Representations, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com