2000 character limit reached
Proximal Policy Optimization with Adaptive Exploration (2405.04664v1)
Published 7 May 2024 in cs.LG and cs.AI
Abstract: Proximal Policy Optimization with Adaptive Exploration (axPPO) is introduced as a novel learning algorithm. This paper investigates the exploration-exploitation tradeoff within the context of reinforcement learning and aims to contribute new insights into reinforcement learning algorithm design. The proposed adaptive exploration framework dynamically adjusts the exploration magnitude during training based on the recent performance of the agent. Our proposed method outperforms standard PPO algorithms in learning efficiency, particularly when significant exploratory behavior is needed at the beginning of the learning process.
- An information-theoretic approach to curiosity-driven reinforcement learning. Theory in Biosciences, 131:139–148, 2012.
- Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Gymnasium, March 2023.
- https://github.com/AndreiLix/axPPO.