No Prior Mask: Eliminate Redundant Action for Deep Reinforcement Learning (2312.06258v1)
Abstract: The large action space is one fundamental obstacle to deploying Reinforcement Learning methods in the real world. The numerous redundant actions will cause the agents to make repeated or invalid attempts, even leading to task failure. Although current algorithms conduct some initial explorations for this issue, they either suffer from rule-based systems or depend on expert demonstrations, which significantly limits their applicability in many real-world settings. In this work, we examine the theoretical analysis of what action can be eliminated in policy optimization and propose a novel redundant action filtering mechanism. Unlike other works, our method constructs the similarity factor by estimating the distance between the state distributions, which requires no prior knowledge. In addition, we combine the modified inverse model to avoid extensive computation in high-dimensional state space. We reveal the underlying structure of action spaces and propose a simple yet efficient redundant action filtering mechanism named No Prior Mask (NPM) based on the above techniques. We show the superior performance of our method by conducting extensive experiments on high-dimensional, pixel-input, and stochastic problems with various action redundancy. Our code is public online at https://github.com/zhongdy15/npm.
- Goal-based action priors. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 25, 306–314.
- Action redundancy in reinforcement learning. In Uncertainty in Artificial Intelligence, 376–385. PMLR.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47: 253–279.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.
- Action priors for large action spaces in robotics. arXiv preprint arXiv:2101.04178.
- Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press. ISBN 9780199535255.
- Exploration by random network distillation. In International Conference on Learning Representations.
- Learning action representations for reinforcement learning. In International conference on machine learning, 941–950. PMLR.
- Babyai: A platform to study the sample efficiency of grounded language learning. arXiv preprint arXiv:1810.08272.
- Minimalistic Gridworld Environment for Gymnasium.
- Continuous Control with Action Quantization from Demonstrations. In Deep RL Workshop NeurIPS 2021.
- Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679.
- Probabilistic policy reuse in a reinforcement learning agent. In Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, 720–727.
- Reinforcement learning for electric power system decision and control: Past considerations and perspectives. IFAC-PapersOnLine, 50(1): 6918–6927.
- Gym-μ𝜇\muitalic_μRTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning. In 2021 IEEE Conference on Games (CoG), 1–8. IEEE.
- When to trust your model: Model-based policy optimization. Advances in neural information processing systems, 32.
- A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059.
- Offline reinforcement learning with value-based episodic memory. arXiv preprint arXiv:2110.09796.
- Discrete sequential prediction of continuous actions for deep rl. arXiv preprint arXiv:1705.05035.
- Human-level control through deep reinforcement learning. nature, 518(7540): 529–533.
- Policy Gradient Methods in the Presence of Symmetries and State Abstractions. arXiv preprint arXiv:2305.05666.
- Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, 2778–2787. PMLR.
- Ravindran, B. 2004. An algebraic approach to abstraction in reinforcement learning. University of Massachusetts Amherst.
- What good are actions? Accelerating learning using learned action priors. In 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL), 1–6. IEEE.
- Action priors for learning domain invariances. IEEE Transactions on Autonomous Mental Development, 7(2): 107–118.
- Reinforcement learning with factored states and actions. The Journal of Machine Learning Research, 5: 1063–1088.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Solving Continuous Control via Q-learning. In The Eleventh International Conference on Learning Representations.
- Learning to factor policies and action-value functions: Factored action space representations for deep reinforcement learning. arXiv preprint arXiv:1705.07269.
- Improving action selection in MDP’s via knowledge transfer. In AAAI, volume 5, 1024–1029.
- Integrating affect sensors in an intelligent tutoring system. In Affective Interactions: The Computer in the Affective Loop Workshop at, 7–13.
- Mastering the game of Go with deep neural networks and tree search. nature, 529(7587): 484–489.
- Learning to Represent Action Values as a Hypergraph on the Action Vertices. In International Conference on Learning Representations (ICLR 2021).
- Action branching architectures for deep reinforcement learning. In Proceedings of the aaai conference on artificial intelligence, volume 32.
- Bounding performance loss in approximate MDP homomorphisms. Advances in Neural Information Processing Systems, 21.
- The Natural Language of Actions. ArXiv, abs/1902.01119.
- Independently controllable features. arXiv preprint arXiv:1708.01289.
- Starcraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782.
- CEIP: Combining Explicit and Implicit Priors for Reinforcement Learning with Demonstrations. Advances in Neural Information Processing Systems, 35: 7614–7627.
- Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 34: 10299–10312.
- Mastering complex control in moba games with deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 6672–6679.
- Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning. In Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K.; Cesa-Bianchi, N.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.
- Learn what not to learn: Action elimination with deep reinforcement learning. Advances in neural information processing systems, 31.