Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
107 tokens/sec
Gemini 2.5 Pro Premium
58 tokens/sec
GPT-5 Medium
29 tokens/sec
GPT-5 High Premium
25 tokens/sec
GPT-4o
101 tokens/sec
DeepSeek R1 via Azure Premium
84 tokens/sec
GPT OSS 120B via Groq Premium
478 tokens/sec
Kimi K2 via Groq Premium
213 tokens/sec
2000 character limit reached

Where2Start: Leveraging initial States for Robust and Sample-Efficient Reinforcement Learning (2311.15089v1)

Published 25 Nov 2023 in cs.LG

Abstract: The reinforcement learning algorithms that focus on how to compute the gradient and choose next actions, are effectively improved the performance of the agents. However, these algorithms are environment-agnostic. This means that the algorithms did not use the knowledge that has been captured by trajectory. This poses that the algorithms should sample many trajectories to train the model. By considering the essence of environment and how much the agent learn from each scenario in that environment, the strategy of the learning procedure can be changed. The strategy retrieves more informative trajectories, so the agent can learn with fewer trajectory sample. We propose Where2Start algorithm that selects the initial state so that the agent has more instability in vicinity of that state. We show that this kind of selection decreases number of trajectories that should be sampled that the agent reach to acceptable reward. Our experiments shows that Where2Start can improve sample efficiency up to 8 times. Also Where2Start can combined with most of state-of-the-art algorithms and improve that robustness and sample efficiency significantly.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135, 1999.
  2. Beyond confidence regions: Tight bayesian ambiguity sets for robust mdps. Advances in neural information processing systems, 32, 2019.
  3. Entropic risk constrained soft-robust policy optimization. arXiv preprint arXiv:2006.11679, 2020.
  4. Twice regularized mdps and the equivalence between robustness and regularization. Advances in Neural Information Processing Systems, 34:22274–22287, 2021.
  5. Robust deep reinforcement learning with adversarial attacks. arXiv preprint arXiv:1712.03632, 2017.
  6. Improving robustness of deep reinforcement learning agents: Environment attack based on the critic network. In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2022.
  7. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284, 2017.
  8. Improved robustness and safety for autonomous vehicle control with adversarial reinforcement learning. In 2018 IEEE Intelligent Vehicles Symposium (IV), pages 1665–1671. IEEE, 2018.
  9. Fictitious self-play in extensive-form games. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 805–813, Lille, France, 07–09 Jul 2015. PMLR.
  10. Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121, 2016.
  11. Robust reinforcement learning via adversarial training with langevin dynamics. Advances in Neural Information Processing Systems, 33:8127–8138, 2020.
  12. Robust adversarial reinforcement learning. In International Conference on Machine Learning, pages 2817–2826. PMLR, 2017.
  13. An introduction to deep reinforcement learning. Foundations and Trends® in Machine Learning, 11(3-4):219–354, 2018.
  14. Approximately optimal approximate reinforcement learning. 2002.
  15. Learning montezuma’s revenge from a single demonstration. arXiv preprint arXiv:1812.03381, 2018.
  16. Exploring restart distributions. arXiv preprint arXiv:1811.11298, 2018.
  17. Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995, 2019.
  18. Reverse curriculum generation for reinforcement learning. In Conference on robot learning, pages 482–495. PMLR, 2017.
  19. Focused real-time dynamic programming for mdps: Squeezing more out of a heuristic. 01 2006.
  20. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1):81–138, 1995.
  21. Bounded real-time dynamic programming: Rtdp with monotone upper bounds and performance guarantees. In Proceedings of the 22nd International Conference on Machine Learning, ICML ’05, page 569–576, New York, NY, USA, 2005. Association for Computing Machinery.
  22. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube