Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bilinear value networks (2204.13695v3)

Published 28 Apr 2022 in cs.AI and cs.LG

Abstract: The dominant framework for off-policy multi-goal reinforcement learning involves estimating goal conditioned Q-value function. When learning to achieve multiple goals, data efficiency is intimately connected with the generalization of the Q-function to new goals. The de-facto paradigm is to approximate Q(s, a, g) using monolithic neural networks. To improve the generalization of the Q-function, we propose a bilinear decomposition that represents the Q-value via a low-rank approximation in the form of a dot product between two vector fields. The first vector field, f(s, a), captures the environment's local dynamics at the state s; whereas the second component, {\phi}(s, g), captures the global relationship between the current state and the goal. We show that our bilinear decomposition scheme substantially improves data efficiency, and has superior transfer to out-of-distribution goals compared to prior methods. Empirical evidence is provided on the simulated Fetch robot task-suite and dexterous manipulation with a Shadow hand.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Hindsight experience replay. In Advances in Neural Information Processing Systems, 2017.
  2. Universal successor features approximators. In Proceedings of the International Conference on Learning Representations, 2019.
  3. Peter Dayan. Improving generalization for temporal difference learning: The successor representation. Neural Comput., 5(4):613–624, 1993. ISSN 0899-7667.
  4. Bootstrap confidence intervals. Statistical science, 11(3):189–228, 1996.
  5. Addressing function approximation error in actor-critic methods. Proceedings of the International Conference on Machine Learning, 2018.
  6. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning, 2018.
  7. Leslie Pack Kaelbling. Learning to achieve goals. In Proceedings of the International Joint Conference on Artificial Intelligence, 1993.
  8. Deep successor reinforcement learning. ArXiv, abs/1606.02396, 2016.
  9. Continuous control with deep reinforcement learning. In Proceedings of the International Conference on Learning Representations, 2016.
  10. Predictive representations of state. 2001.
  11. Value function approximation with diffusion wavelets and laplacian eigenfunctions. Advances in Neural Information Processing Systems, 2006.
  12. Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research, 2007.
  13. Learning to navigate in complex environments. Proceedings of the International Conference on Representation Learning, 2017.
  14. Human-level control through deep reinforcement learning. Nature, 2015.
  15. Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In Proceedings of the International Conference on Machine Learning, 2020.
  16. Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018.
  17. Universal value function approximators. In Proceedings of the International Conference on Machine Learning, 2015.
  18. Deterministic policy gradient algorithms. In International Conference on Machine Learning, 2014.
  19. Learning predictive state representations. Proceedings in the International Conference on Machine Learning, 2003.
  20. Reinforcement Learning: An Introduction. MIT Press, October 2018. ISBN 9780262352703. URL https://play.google.com/store/books/details?id=uWV0DwAAQBAJ.
  21. Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2011.
  22. Q-learning. Machine learning, 8(3):279–292, 1992.
Citations (7)

Summary

We haven't generated a summary for this paper yet.