Learning Successor Features the Simple Way
Abstract: In Deep Reinforcement Learning (RL), it is a challenge to learn representations that do not exhibit catastrophic forgetting or interference in non-stationary environments. Successor Features (SFs) offer a potential solution to this challenge. However, canonical techniques for learning SFs from pixel-level observations often lead to representation collapse, wherein representations degenerate and fail to capture meaningful variations in the data. More recent methods for learning SFs can avoid representation collapse, but they often involve complex losses and multiple learning phases, reducing their efficiency. We introduce a novel, simple method for learning SFs directly from pixels. Our approach uses a combination of a Temporal-difference (TD) loss and a reward prediction loss, which together capture the basic mathematical definition of SFs. We show that our approach matches or outperforms existing SF learning techniques in both 2D (Minigrid), 3D (Miniworld) mazes and Mujoco, for both single and continual learning scenarios. As well, our technique is efficient, and can reach higher levels of performance in less time than other approaches. Our work provides a new, streamlined technique for learning SFs directly from pixel observations, with no pretraining required.
- Loss of plasticity in continual deep reinforcement learning. arXiv.org, 2023. doi: 10.48550/arxiv.2303.07507.
- A new representation of successor features for transfer across dissimilar environments. International Conference on Machine Learning, 2021.
- Better fine-tuning by reducing representational collapse. arXiv preprint arXiv:2008.03156, 2020.
- Optimistic linear support and successor features as a basis for optimal policy transfer. International Conference on Machine Learning, 2022. doi: 10.48550/arxiv.2206.11326.
- N. Anand and D. Precup. Prediction and control in continual reinforcement learning. arXiv.org, 2023. doi: 10.48550/arxiv.2312.11669.
- Successor features for transfer in reinforcement learning. Advances in neural information processing systems, 30, 2017.
- Transfer in deep reinforcement learning using successor features and generalised policy improvement. In International Conference on Machine Learning, pages 501–510. PMLR, 2018.
- Fast reinforcement learning with generalized policy updates. Proceedings of the National Academy of Sciences, 117(48):30079–30087, 2020.
- Lukas Biewald. Experiment tracking with weights and biases, 2020. URL https://www.wandb.com/. Software available from wandb.com.
- Universal successor features approximators. arXiv preprint arXiv:1812.07626, 2018.
- JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
- Successor feature sets: Generalizing successor representations across policies. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11774–11781, 2021.
- Composing task knowledge with modular successor feature approximators. International Conference on Learning Representations, 2023a. doi: 10.48550/arxiv.2301.12305.
- Combining behaviors with the successor features keyboard. arXiv preprint arXiv:2310.15940, 2023b.
- Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. CoRR, abs/2306.13831, 2023.
- Peter Dayan. Improving generalization for temporal difference learning: The successor representation. Neural Computation, 1993. doi: 10.1162/neco.1993.5.4.613.
- Successor feature neural episodic control. arXiv.org, 2021.
- Proto-value networks: Scaling representation learning with auxiliary tasks. International Conference on Learning Representations, 2023. doi: 10.48550/arxiv.2304.12567.
- Psiphi-learning: Reinforcement learning with demonstrations using successor features and inverse temporal difference learning. null, 2021. doi: 10.48550/arxiv.2102.12560.
- A deep reinforcement learning approach to marginalized importance sampling with the successor representation. International Conference on Machine Learning, 2021.
- Jraph: A library for graph neural networks in jax., 2020. URL http://github.com/deepmind/jraph.
- Embracing change: Continual learning in deep neural networks. Trends in cognitive sciences, 24(12):1028–1040, 2020.
- Fast task inference with variational intrinsic successor features. arXiv preprint arXiv:1906.05030, 2019.
- Haiku: Sonnet for JAX, 2020. URL http://github.com/deepmind/dm-haiku.
- J. D. Hunter. Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9(3):90–95, 2007. doi: 10.1109/MCSE.2007.55.
- Successor uncertainties: exploration and uncertainty in temporal difference learning. Advances in Neural Information Processing Systems, 32, 2019.
- Continual reinforcement learning with complex synapses. In International Conference on Machine Learning, pages 2497–2506. PMLR, 2018.
- Towards continual reinforcement learning: A review and perspectives. Journal of Artificial Intelligence Research, 75:1401–1476, 2022.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- Yehuda Koren. On spectral graph drawing. In International Computing and Combinatorics Conference, pages 496–508. Springer, 2003.
- Deep successor reinforcement learning. arXiv: Machine Learning, 2016.
- Urlb: Unsupervised reinforcement learning benchmark. arXiv preprint arXiv:2110.15191, 2021.
- Successor features support model-based and model-free reinforcement learning. CoRR abs/1901.11437, 2019.
- Advantages and limitations of using successor features for transfer in reinforcement learning. arXiv preprint arXiv:1708.00102, 2017.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Aps: Active pretraining with successor features. In International Conference on Machine Learning, pages 6736–6747. PMLR, 2021.
- Universal successor features for transfer reinforcement learning. arXiv preprint arXiv:2001.04025, 2020.
- A laplacian framework for option discovery in reinforcement learning. In International Conference on Machine Learning, pages 2295–2304. PMLR, 2017a.
- Eigenoption discovery through the deep successor representation. arXiv preprint arXiv:1710.11089, 2017b.
- Count-based exploration with the successor representation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 5125–5133, 2020.
- Temporal abstraction in reinforcement learning with the successor representation. Journal of machine learning research, 2021.
- Better transfer learning with inferred successor maps. Advances in neural information processing systems, 32, 2019.
- Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research, 8(10), 2007.
- UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiv e-prints, February 2018.
- Umap: Uniform manifold approximation and projection. The Journal of Open Source Software, 3(29):861, 2018.
- Continual auxiliary task learning. Neural Information Processing Systems, 2022.
- Human-level control through deep reinforcement learning. Nature, 2015. doi: 10.1038/nature14236.
- Mark W. Nemecek and R. Parr. Policy caches with successor features. International Conference on Machine Learning, 2021.
- Continual lifelong learning with neural networks: A review. Neural networks, 113:54–71, 2019.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Cora: Benchmarks, baselines, and metrics as a platform for continual reinforcement learning agents. In Conference on Lifelong Learning Agents, pages 705–743. PMLR, 2022.
- Successor feature representations. arXiv preprint arXiv:2110.15701, 2021.
- Experience replay for continual learning. Advances in Neural Information Processing Systems, 32, 2019.
- Progressive neural networks. arXiv.org, 2016.
- The hippocampus as a predictive map. Nature Neuroscience, 2017. doi: 10.1038/nn.4650.
- Richard S Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3:9–44, 1988.
- Reinforcement learning: An introduction. MIT press, 2018.
- Does zero-shot reinforcement learning exist? arXiv preprint arXiv:2209.14935, 2022.
- Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, 2016.
- Python 3 Reference Manual. CreateSpace, Scotts Valley, CA, 2009. ISBN 1441412697.
- Michael L. Waskom. seaborn: statistical data visualization. Journal of Open Source Software, 6(60):3021, 2021. doi: 10.21105/joss.03021. URL https://doi.org/10.21105/joss.03021.
- Omry Yadan. Hydra - a framework for elegantly configuring complex applications. Github, 2019. URL https://github.com/facebookresearch/hydra.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.
- Deep reinforcement learning with successor features for navigation across similar environments. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2371–2378. IEEE, 2017.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.