Reward Bonuses with Gain Scheduling Inspired by Iterative Deepening Search (2212.10765v1)
Abstract: This paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search. While various bonuses have been designed to date, they are analogous to the depth-first and breadth-first search algorithms in graph theory. This paper, therefore, first designs two bonuses for each of them. Then, a heuristic gain scheduling is applied to the designed bonuses, inspired by the iterative deepening search, which is known to inherit the advantages of the two search algorithms. The proposed method is expected to allow agent to efficiently reach the best solution in deeper states by gradually exploring unknown states. In three locomotion tasks with dense rewards and three simple tasks with sparse rewards, it is shown that the two types of bonuses contribute to the performance improvement of the different tasks complementarily. In addition, by combining them with the proposed gain scheduling, all tasks can be accomplished with high performance.
- Hindsight experience replay. Advances in neural information processing systems 30.
- Meta-optimization of bias-variance trade-off in stochastic model learning. IEEE Access 9, 148783–148799.
- A survey on intrinsic motivation in reinforcement learning. arXiv preprint arXiv:1908.06976 .
- Layer normalization. arXiv preprint arXiv:1607.06450 .
- Squareplus: A softplus-like algebraic rectifier. arXiv preprint arXiv:2112.11687 .
- Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems 29.
- Smirl: Surprise minimizing reinforcement learning in unstable environments. arXiv preprint arXiv:1912.05510 .
- Fast and slow curiosity for high-level exploration in reinforcement learning. Applied Intelligence 51, 1086–1107.
- Reinforcement, reward, and intrinsic motivation: A meta-analysis. Review of Educational research 64, 363–423.
- Intrinsically motivated reinforcement learning. Advances in neural information processing systems 17.
- Sparse actor-critic: Sparse tsallis entropy regularized reinforcement learning in a continuous action space, in: 2020 17th International Conference on Ubiquitous Robots (UR), IEEE. pp. 68–73.
- Deep reinforcement learning in a handful of trials using probabilistic dynamics models, in: Advances in Neural Information Processing Systems, pp. 4754–4765.
- Pybullet, a python module for physics simulation for games, robotics and machine learning. GitHub repository .
- Opportunities and challenges in explainable artificial intelligence (xai): A survey. arXiv preprint arXiv:2006.11371 .
- Temporal difference uncertainties as a signal for exploration. arXiv preprint arXiv:2010.02255 .
- Safe and efficient imitation learning by clarification of experienced latent space. Advanced Robotics 35, 1012–1027.
- Rényi divergence measures for commonly used univariate continuous distributions. Information Sciences 249, 124–131.
- World models. arXiv preprint arXiv:1803.10122 .
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, in: International conference on machine learning, PMLR. pp. 1861–1870.
- Adaterm: Adaptive t-distribution estimated robust moments towards noise-robust stochastic gradient optimizer. arXiv preprint arXiv:2201.06714 .
- Student-t policy in reinforcement learning to acquire global optimum of robot control. Applied Intelligence 49, 4335–4347.
- Consolidated adaptive t-soft update for deep reinforcement learning. arXiv preprint arXiv:2202.12504 .
- L2c2: Locally lipschitz continuous constraint towards stable and smooth reinforcement learning, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4032–4039.
- Proximal policy optimization with adaptive threshold for symmetric relative density ratio. Results in Control and Optimization , 100192.
- Towards autonomous driving of personal mobility with small and noisy dataset using tsallis-statistics-based behavioral cloning. arXiv preprint arXiv:2111.14294 .
- Depth-first iterative-deepening: An optimal admissible tree search. Artificial intelligence 27, 97–109.
- Sim-to-real reinforcement learning for deformable object manipulation, in: Conference on Robot Learning, PMLR. pp. 734–743.
- Optimized assistive human–robot interaction using reinforcement learning. IEEE transactions on cybernetics 46, 655–667.
- Policy invariance under reward transformations: Theory and application to reward shaping, in: International conference on machine learning.
- Self-imitation learning, in: International Conference on Machine Learning, PMLR. pp. 3878–3887.
- Planet of the bayesians: Reconsidering and improving deep planning network by incorporating bayesian inference, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE. pp. 5611–5618.
- Randomized prior functions for deep reinforcement learning. Advances in Neural Information Processing Systems 31.
- Deep exploration via bootstrapped dqn. Advances in neural information processing systems 29, 4026–4034.
- Td-regularized actor-critic methods. Machine Learning 108, 1467–1501.
- Automatic differentiation in pytorch, in: Advances in Neural Information Processing Systems Workshop.
- Self-supervised exploration via disagreement, in: International conference on machine learning, PMLR. pp. 5062–5071.
- Learning to walk in minutes using massively parallel deep reinforcement learning, in: Conference on Robot Learning, PMLR. pp. 91–100.
- Prioritized experience replay. arXiv preprint arXiv:1511.05952 .
- Reinforcement learning: An introduction. MIT press.
- Sim-to-real: Learning agile locomotion for quadruped robots., in: Robotics: Science and Systems.
- Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation. Robotics and Autonomous Systems 112, 72–83.
- dm_control: Software and tasks for continuous control. Software Impacts 6, 100022.
- Improving exploration in soft-actor-critic with normalizing flows policies. arXiv preprint arXiv:1906.02771 .
- Introduction to graph theory. volume 2. Prentice hall Upper Saddle River.