Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

InfoBot: Transfer and Exploration via the Information Bottleneck (1901.10902v5)

Published 30 Jan 2019 in stat.ML and cs.LG

Abstract: A central challenge in reinforcement learning is discovering effective policies for tasks where rewards are sparsely distributed. We postulate that in the absence of useful reward signals, an effective exploration strategy should seek out {\it decision states}. These states lie at critical junctions in the state space from where the agent can transition to new, potentially unexplored regions. We propose to learn about decision states from prior experience. By training a goal-conditioned policy with an information bottleneck, we can identify decision states by examining where the model actually leverages the goal state. We find that this simple mechanism effectively identifies decision states, even in partially observed settings. In effect, the model learns the sensory cues that correlate with potential subgoals. In new environments, this model can then identify novel subgoals for further exploration, guiding the agent through a sequence of potential decision states and through new regions of the state space.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. A. Achille and S. Soatto. Information Dropout: Learning Optimal Representations Through Noisy Computation. ArXiv e-prints, Nov. 2016.
  2. A. Achille and S. Soatto. On the emergence of invariance and disentangling in deep representations. CoRR, abs/1706.01350, 2017. URL http://arxiv.org/abs/1706.01350.
  3. Deep variational information bottleneck. arXiv preprint arXiv:1612.00410, 2016.
  4. Deep variational information bottleneck. International Conference on Learning Representations (ICLR), abs/1612.00410, 2017. URL http://arxiv.org/abs/1612.00410.
  5. N. J. Beaudry and R. Renner. An intuitive proof of the data processing inequality. Quantum Information & Computation, 12(5-6):432–441, 2012. URL http://www.rintonpress.com/xxqic12/qic-12-56/0432-0441.pdf.
  6. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
  7. Unifying Count-Based Exploration and Intrinsic Motivation. ArXiv e-prints, June 2016.
  8. OpenAI Gym. ArXiv e-prints, June 2016.
  9. Relevant sparse codes with variational information bottleneck. ArXiv e-prints, May 2016.
  10. M. Chevalier-Boisvert and L. Willems. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018.
  11. Elements of Information Theory. Wiley-Interscience, 2006.
  12. D. Foster and P. Dayan. Structure in the space of value functions. Machine Learning, 49(2):325–346, Nov 2002. ISSN 1573-0565. doi: 10.1023/A:1017944732463. URL https://doi.org/10.1023/A:1017944732463.
  13. Recall traces: Backtracking models for efficient reinforcement learning. arXiv preprint arXiv:1804.00379, 2018.
  14. Meta-Reinforcement Learning of Structured Exploration Strategies. ArXiv e-prints, Feb. 2018.
  15. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. CoRR, abs/1801.01290, 2018. URL http://arxiv.org/abs/1801.01290.
  16. Neuroscience-inspired artificial intelligence. Neuron, 95(2):245–248, 2017. URL https://doi.org/10.1016/j.neuron.2017.06.011.
  17. Vime: Variational information maximizing exploration. In Advances in Neural Information Processing Systems, pages 1109–1117, 2016.
  18. Representation learning for grounded spatial reasoning. Transactions of the Association of Computational Linguistics, 6:49–61, 2018.
  19. S. J. Kazemitabar and H. Beigy. Using strongly connected components as a basis for autonomous skill acquisition in reinforcement learning. In International Symposium on Neural Networks, pages 794–803. Springer, 2009.
  20. D. P. Kingma and M. Welling. Auto-Encoding Variational Bayes. International Conference on Learning Representations (ICLR), Dec. 2014.
  21. Equitability, mutual information, and the maximal information coefficient. Proceedings of the National Academy of Sciences, 111(9):3354–3359, 2014. ISSN 0027-8424. doi: 10.1073/pnas.1309933111. URL http://www.pnas.org/content/111/9/3354.
  22. Nonlinear Information Bottleneck. ArXiv e-prints, May 2017.
  23. W. Kool and M. Botvinick. Mental labour. Nature Human Behaviour, 2018.
  24. I. Kostrikov. Pytorch implementations of reinforcement learning algorithms. https://github.com/ikostrikov/pytorch-a2c-ppo-acktr, 2018.
  25. A laplacian framework for option discovery in reinforcement learning. arXiv preprint arXiv:1703.00956, 2017.
  26. A. McGovern and A. G. Barto. Automatic discovery of subgoals in reinforcement learning using diverse density. Citeseer, 2001.
  27. Q-cut—dynamic discovery of sub-goals in reinforcement learning. In European Conference on Machine Learning, pages 295–306. Springer, 2002.
  28. An integrative theory of prefrontal cortex function. Annual Review of Neuroscience, 24(1):167–202, 2001. doi: 10.1146/annurev.neuro.24.1.167. URL https://doi.org/10.1146/annurev.neuro.24.1.167. PMID: 11283309.
  29. Asynchronous Methods for Deep Reinforcement Learning. ArXiv e-prints, 2016.
  30. I. Mordatch and P. Abbeel. Emergence of grounded compositional language in multi-agent populations. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  31. Deep exploration via bootstrapped dqn. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 4026–4034. Curran Associates, Inc., 2016. URL http://papers.nips.cc/paper/6501-deep-exploration-via-bootstrapped-dqn.pdf.
  32. Count-Based Exploration with Neural Density Models. ArXiv e-prints, Mar. 2017.
  33. Curiosity-driven exploration by self-supervised prediction. CoRR, abs/1705.05363, 2017a. URL http://arxiv.org/abs/1705.05363.
  34. Curiosity-driven exploration by self-supervised prediction. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 2778–2787, International Convention Centre, Sydney, Australia, 06–11 Aug 2017b. PMLR. URL http://proceedings.mlr.press/v70/pathak17a.html.
  35. Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research. ArXiv e-prints, Feb. 2018.
  36. Universal value function approximators. In F. Bach and D. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1312–1320, Lille, France, 07–09 Jul 2015. PMLR. URL http://proceedings.mlr.press/v37/schaul15.html.
  37. J. Schmidhuber. Curious model-building control systems. In In Proc. International Joint Conference on Neural Networks, Singapore, pages 1458–1463. IEEE, 1991.
  38. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017. URL http://arxiv.org/abs/1707.06347.
  39. Proximal Policy Optimization Algorithms. ArXiv e-prints, July 2017.
  40. Identifying useful subgoals in reinforcement learning by local graph partitioning. In Proceedings of the 22nd international conference on Machine learning, pages 816–823. ACM, 2005.
  41. M. Stolle and D. Precup. Learning options in reinforcement learning. In International Symposium on abstraction, reformulation, and approximation, pages 212–223. Springer, 2002.
  42. An analysis of model-based interval estimation for markov decision processes. J. Comput. Syst. Sci., 74(8):1309–1331, Dec. 2008. ISSN 0022-0000. doi: 10.1016/j.jcss.2007.08.009. URL http://dx.doi.org/10.1016/j.jcss.2007.08.009.
  43. Learning to share and hide intentions using information regularization. In Advances in Neural Information Processing Systems (NIPS) 31. NeurIPS, 2018.
  44. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Neural Information Processing Systems, NIPS’99, pages 1057–1063, Cambridge, MA, USA, 1999a. MIT Press. URL http://dl.acm.org/citation.cfm?id=3009657.3009806.
  45. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2):181–211, Aug. 1999b. ISSN 0004-3702. doi: 10.1016/S0004-3702(99)00052-1. URL http://dx.doi.org/10.1016/S0004-3702(99)00052-1.
  46. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning. ArXiv e-prints, Nov. 2016.
  47. The information bottleneck method. Proceedings of The 37th Allerton Conference on Communication, Control, and Computing, pages 368–377, 1999.
  48. The information bottleneck method. arXiv preprint physics/0004057, 2000.
  49. Grounding subgoals in information transitions. IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2011. URL https://ieeexplore.ieee.org/document/5967384/.
  50. Feudal networks for hierarchical reinforcement learning. CoRR, abs/1703.01161, 2017. URL http://arxiv.org/abs/1703.01161.
  51. Imagination-augmented agents for deep reinforcement learning. CoRR, abs/1707.06203, 2017. URL http://arxiv.org/abs/1707.06203.
  52. Distral: Robust Multitask Reinforcement Learning. ArXiv e-prints, 2017.
  53. R. J. Williams. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 8(3-4):229–256, 1992. ISSN 0885-6125. doi: 10.1007/BF00992696. URL https://doi.org/10.1007/BF00992696.
Citations (158)

Summary

We haven't generated a summary for this paper yet.