POLTER: Policy Trajectory Ensemble Regularization for Unsupervised Reinforcement Learning (2205.11357v3)
Abstract: The goal of Unsupervised Reinforcement Learning (URL) is to find a reward-agnostic prior policy on a task domain, such that the sample-efficiency on supervised downstream tasks is improved. Although agents initialized with such a prior policy can achieve a significantly higher reward with fewer samples when finetuned on the downstream task, it is still an open question how an optimal pretrained prior policy can be achieved in practice. In this work, we present POLTER (Policy Trajectory Ensemble Regularization) - a general method to regularize the pretraining that can be applied to any URL algorithm and is especially useful on data- and knowledge-based URL algorithms. It utilizes an ensemble of policies that are discovered during pretraining and moves the policy of the URL algorithm closer to its optimal prior. Our method is based on a theoretical framework, and we analyze its practical effects on a white-box benchmark, allowing us to study POLTER with full control. In our main experiments, we evaluate POLTER on the Unsupervised Reinforcement Learning Benchmark (URLB), which consists of 12 tasks in 3 domains. We demonstrate the generality of our approach by improving the performance of a diverse set of data- and knowledge-based URL algorithms by 19% on average and up to 40% in the best case. Under a fair comparison with tuned baselines and tuned POLTER, we establish a new state-of-the-art for model-free methods on the URLB.
- Deep Reinforcement Learning at the Edge of the Statistical Precipice. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pages 29304–29320, 2021.
- Relative variational intrinsic control. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, the Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 6732–6740. AAAI Press, 2021.
- Autonomous navigation of stratospheric balloons using reinforcement learning. Nature, 588(7836):77–82, December 2020. ISSN 0028-0836, 1476-4687. doi: 10.1038/s41586-020-2939-8.
- Exploration by random network distillation. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- Nuclear Norm Maximization Based Curiosity-Driven Learning, May 2022a.
- Uncertainty Estimation based Intrinsic Reward For Efficient Reinforcement Learning. In 2022 IEEE International Conference on Joint Cloud Computing (JCC), pages 1–8, Fremont, CA, USA, August 2022b. IEEE. ISBN 978-1-66546-285-3. doi: 10.1109/JCC56315.2022.00008.
- Unsupervised Reinforcement Learning for Transferable Manipulation Skill Discovery. IEEE Robotics and Automation Letters, 7(3):7455–7462, July 2022. ISSN 2377-3766, 2377-3774. doi: 10.1109/LRA.2022.3171915.
- Elements of Information Theory. Wiley-Interscience, July 2006.
- Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, February 2022. ISSN 0028-0836, 1476-4687. doi: 10.1038/s41586-021-04301-9.
- Forward model approximation for general video game learning. In 2018 IEEE Conference on Computational Intelligence and Games (CIG), pages 1–8, 2018. doi: 10.1109/CIG.2018.8490411.
- Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control. arXiv:1812.00568 [cs], December 2018.
- Diversity is All You Need: Learning Skills without a Reward Function. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- The Information Geometry of Unsupervised Reinforcement Learning. In International Conference on Learning Representations, 2022.
- Variational Intrinsic Control. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. OpenReview.net, 2017.
- Geometric Entropic Exploration, January 2021.
- Recurrent World Models Facilitate Policy Evolution. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 2455–2467, 2018.
- Temporal difference learning for model predictive control. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 8387–8406. PMLR, 2022.
- Provably Efficient Maximum Entropy Exploration. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 2681–2691. PMLR, 2019.
- Wasserstein Unsupervised Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 36(6):6884–6892, June 2022. ISSN 2374-3468. doi: 10.1609/aaai.v36i6.20645.
- C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining, November 2022.
- Unsupervised Skill Discovery via Recurrent Skill Training. In Advances in Neural Information Processing Systems, October 2022.
- Reward-Free Exploration for Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 4870–4879. PMLR, 2020.
- CURL: Contrastive Unsupervised Representations for Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 5639–5650. PMLR, 2020.
- URLB: Unsupervised Reinforcement Learning Benchmark. In Joaquin Vanschoren and Sai-Kit Yeung, editors, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, Virtual, 2021.
- CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery. arXiv:2202.00161 [cs], February 2022.
- Efficient Exploration via State Marginal Matching. arXiv:1906.05274 [cs, stat], February 2020.
- Continuous control with deep reinforcement learning. In Yoshua Bengio and Yann LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
- APS: Active Pretraining with Successor Features. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 6736–6747, 2021.
- A graph placement methodology for fast chip design. Nature, 594(7862):207–212, June 2021. ISSN 1476-4687. doi: 10.1038/s41586-021-03544-w.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015. ISSN 0028-0836, 1476-4687. doi: 10.1038/nature14236.
- Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 2125–2133, 2015.
- Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10):9028–9036, May 2021. ISSN 2374-3468, 2159-5399. doi: 10.1609/aaai.v35i10.17091.
- Unsupervised reinforcement learning in multiple environments. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, the Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pages 7850–7858. AAAI Press, 2022.
- K-Means Maximum Entropy Exploration, June 2022.
- Solving Rubik’s Cube with a Robot Hand. arXiv:1910.07113 [cs, stat], October 2019.
- An Adversarial Interpretation of Information-Theoretic Bounded Rationality. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1), June 2014. ISSN 2374-3468, 2159-5399. doi: 10.1609/aaai.v28i1.9071.
- Intrinsic Motivation Systems for Autonomous Mental Development. IEEE Transactions on Evolutionary Computation, 11(2):265–286, April 2007. ISSN 1089-778X. doi: 10.1109/TEVC.2006.890271.
- Automated reinforcement learning (AutoRL): A survey and open problems. Journal of Artificial Intelligence Research, 74:517–568, 2022.
- Curiosity-driven Exploration by Self-supervised Prediction. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 2778–2787. PMLR, 2017.
- Self-Supervised Exploration via Disagreement. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 5062–5071. PMLR, 2019.
- Unsupervised model-based pre-training for data-efficient reinforcement learning from pixels. In Decision Awareness in Reinforcement Learning Workshop at ICML 2022, 2022.
- Efficient task adaptation by mixing discovered skills. In First Workshop on Pre-Training: Perspectives, Pitfalls, and Paths Forward at ICML 2022, 2022.
- Data-Efficient Reinforcement Learning with Self-Predictive Representations. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021a.
- Pretraining Representations for Data-Efficient Reinforcement Learning. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pages 12686–12699, 2021b.
- Reinforcement learning with action-free pre-training from videos. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 19561–19579. PMLR, 2022.
- One After Another: Learning Incremental Skills for a Changing World, March 2022.
- Dynamics-aware unsupervised discovery of skills. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
- Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815 [cs], December 2017.
- Unsupervised Learning for Reinforcement Learning, 2021.
- Decoupling Representation Learning from Reinforcement Learning, May 2021.
- Reinforcement Learning: An Introduction. IEEE Trans. Neural Networks, 9(5):1054–1054, 1998. doi: 10.1109/TNN.1998.712192.
- DeepMind Control Suite. arXiv:1801.00690 [cs], January 2018.
- Dm_control: Software and Tasks for Continuous Control. Software Impacts, 6:100022, November 2020. ISSN 26659638. doi: 10.1016/j.simpa.2020.100022.
- Representation Learning with Contrastive Predictive Coding, January 2019.
- Learning General World Models in a Handful of Reward-Free Deployments, October 2022.
- Reinforcement Learning with Prototypical Representations. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 11920–11931. PMLR, 2021.
- Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning. In ICLR 2022 Workshop on Generalizable Policy Learning in Physical World, 2022.
- EUCLID: Towards efficient unsupervised reinforcement learning with multi-choice dynamics model. CoRR, abs/2210.00498, 2022. doi: 10.48550/arXiv.2210.00498.
- APD: Learning Diverse Behaviors for Reinforcement Learning Through Unsupervised Active Pre-Training. IEEE Robotics and Automation Letters, 7(4):12251–12258, October 2022. ISSN 2377-3766, 2377-3774. doi: 10.1109/LRA.2022.3214057.
- Exploration by Maximizing R\’enyi Entropy for Reward-Free RL Framework, December 2020.
- A mixture of surprises for unsupervised reinforcement learning. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, December 2022.