Where2Start: Leveraging initial States for Robust and Sample-Efficient Reinforcement Learning (2311.15089v1)
Abstract: The reinforcement learning algorithms that focus on how to compute the gradient and choose next actions, are effectively improved the performance of the agents. However, these algorithms are environment-agnostic. This means that the algorithms did not use the knowledge that has been captured by trajectory. This poses that the algorithms should sample many trajectories to train the model. By considering the essence of environment and how much the agent learn from each scenario in that environment, the strategy of the learning procedure can be changed. The strategy retrieves more informative trajectories, so the agent can learn with fewer trajectory sample. We propose Where2Start algorithm that selects the initial state so that the agent has more instability in vicinity of that state. We show that this kind of selection decreases number of trajectories that should be sampled that the agent reach to acceptable reward. Our experiments shows that Where2Start can improve sample efficiency up to 8 times. Also Where2Start can combined with most of state-of-the-art algorithms and improve that robustness and sample efficiency significantly.
- Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135, 1999.
- Beyond confidence regions: Tight bayesian ambiguity sets for robust mdps. Advances in neural information processing systems, 32, 2019.
- Entropic risk constrained soft-robust policy optimization. arXiv preprint arXiv:2006.11679, 2020.
- Twice regularized mdps and the equivalence between robustness and regularization. Advances in Neural Information Processing Systems, 34:22274–22287, 2021.
- Robust deep reinforcement learning with adversarial attacks. arXiv preprint arXiv:1712.03632, 2017.
- Improving robustness of deep reinforcement learning agents: Environment attack based on the critic network. In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2022.
- Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284, 2017.
- Improved robustness and safety for autonomous vehicle control with adversarial reinforcement learning. In 2018 IEEE Intelligent Vehicles Symposium (IV), pages 1665–1671. IEEE, 2018.
- Fictitious self-play in extensive-form games. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 805–813, Lille, France, 07–09 Jul 2015. PMLR.
- Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121, 2016.
- Robust reinforcement learning via adversarial training with langevin dynamics. Advances in Neural Information Processing Systems, 33:8127–8138, 2020.
- Robust adversarial reinforcement learning. In International Conference on Machine Learning, pages 2817–2826. PMLR, 2017.
- An introduction to deep reinforcement learning. Foundations and Trends® in Machine Learning, 11(3-4):219–354, 2018.
- Approximately optimal approximate reinforcement learning. 2002.
- Learning montezuma’s revenge from a single demonstration. arXiv preprint arXiv:1812.03381, 2018.
- Exploring restart distributions. arXiv preprint arXiv:1811.11298, 2018.
- Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995, 2019.
- Reverse curriculum generation for reinforcement learning. In Conference on robot learning, pages 482–495. PMLR, 2017.
- Focused real-time dynamic programming for mdps: Squeezing more out of a heuristic. 01 2006.
- Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1):81–138, 1995.
- Bounded real-time dynamic programming: Rtdp with monotone upper bounds and performance guarantees. In Proceedings of the 22nd International Conference on Machine Learning, ICML ’05, page 569–576, New York, NY, USA, 2005. Association for Computing Machinery.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.