Papers
Topics
Authors
Recent
Search
2000 character limit reached

Representation-Driven Reinforcement Learning

Published 31 May 2023 in cs.LG and cs.AI | (2305.19922v2)

Abstract: We present a representation-driven framework for reinforcement learning. By representing policies as estimates of their expected values, we leverage techniques from contextual bandits to guide exploration and exploitation. Particularly, embedding a policy network into a linear feature space allows us to reframe the exploration-exploitation problem as a representation-exploitation problem, where good policy representations enable optimal exploration. We demonstrate the effectiveness of this framework through its application to evolutionary and policy gradient-based approaches, leading to significantly improved performance compared to traditional methods. Our framework provides a new perspective on reinforcement learning, highlighting the importance of policy representation in determining optimal exploration-exploitation strategies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems, pp. 2312–2320, 2011.
  2. Linear thompson sampling revisited. In Artificial Intelligence and Statistics, pp.  176–184. PMLR, 2017.
  3. Asymptotically efficient adaptive allocation schemes for controlled markov chains: Finite parameter space. Technical report, MICHIGAN UNIV ANN ARBOR COMMUNICATIONS AND SIGNAL PROCESSING LAB, 1988.
  4. Thompson sampling for contextual bandits with linear payoffs. In International Conference on Machine Learning, pp. 127–135, 2013.
  5. Successor features for transfer in reinforcement learning. Advances in neural information processing systems, 30, 2017.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. Optimal adaptive policies for markov decision processes. Mathematics of Operations Research, 22(1):222–255, 1997.
  8. Learning action representations for reinforcement learning. In International conference on machine learning, pp. 941–950. PMLR, 2019.
  9. Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp.  208–214. JMLR Workshop and Conference Proceedings, 2011.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  11. Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
  12. Beyond ucb: Optimal and efficient contextual bandits with regression oracles. In International Conference on Machine Learning, pp. 3199–3210. PMLR, 2020.
  13. Adaptive policies for markov renewal programs. The Annals of Statistics, 1(2):334–341, 1973.
  14. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. International conference on machine learning, pp. 1861–1870, 2018.
  15. Learning an embedding space for transferable robot skills. In International Conference on Learning Representations, 2018.
  16. Neural contextual bandits without regret. In International Conference on Artificial Intelligence and Statistics, pp.  240–278. PMLR, 2022.
  17. Evolution-guided policy gradient in reinforcement learning. Advances in Neural Information Processing Systems, 31, 2018.
  18. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  19. Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pp. 5639–5650. PMLR, 2020.
  20. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pp.  661–670, 2010.
  21. Erl-re: Efficient evolutionary reinforcement learning with shared state representation and individual policy representation. arXiv preprint arXiv:2210.17375, 2022.
  22. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  23. Simple random search provides a competitive approach to reinforcement learning. arXiv preprint arXiv:1803.07055, 2018.
  24. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  25. Online limited memory neural-linear bandits with likelihood matching. arXiv preprint arXiv:2102.03799, 2021.
  26. Visual reinforcement learning with imagined goals. Advances in neural information processing systems, 31, 2018.
  27. Equivariant architectures for learning in deep weight spaces. arXiv preprint arXiv:2301.12780, 2023.
  28. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017.
  29. Survey of model-based reinforcement learning: Applications on robotics. Journal of Intelligent & Robotic Systems, 86(2):153–173, 2017.
  30. Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. arXiv preprint arXiv:1802.09127, 2018.
  31. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864, 2017.
  32. Trust region policy optimization. International conference on machine learning, pp. 1889–1897, 2015.
  33. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  34. Language is power: Representing states using natural language in reinforcement learning. arXiv preprint arXiv:1910.02789, 2019.
  35. Deterministic policy gradient algorithms. International conference on machine learning, pp.  387–395, 2014.
  36. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
  37. Reinforcement learning: An introduction. MIT press Cambridge, 1998.
  38. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
  39. What about inputting policy in value function: Policy representation and policy-extended value function approximator. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  8441–8449, 2022.
  40. Uncertainty estimation using riemannian model dynamics for offline reinforcement learning. In Advances in Neural Information Processing Systems.
  41. The natural language of actions. In International Conference on Machine Learning, pp. 6196–6205. PMLR, 2019.
  42. Distributional policy optimization: An alternative approach for continuous control. Advances in Neural Information Processing Systems, 32, 2019.
  43. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.  5026–5033, 2012. doi: 10.1109/IROS.2012.6386109.
  44. Learning reward machines for partially observable reinforcement learning. Advances in neural information processing systems, 32, 2019.
  45. Natural evolution strategies. The Journal of Machine Learning Research, 15(1):949–980, 2014.
  46. Neural contextual bandits with deep representation and shallow exploration. arXiv preprint arXiv:2012.01780, 2020.
  47. Minatar: An atari-inspired testbed for thorough and reproducible reinforcement learning experiments. arXiv preprint arXiv:1903.03176, 2019.
  48. Neural contextual bandits with ucb-based exploration. In International Conference on Machine Learning, pp. 11492–11502. PMLR, 2020.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.