Sample Efficient Reinforcement Learning by Automatically Learning to Compose Subtasks (2401.14226v1)
Abstract: Improving sample efficiency is central to Reinforcement Learning (RL), especially in environments where the rewards are sparse. Some recent approaches have proposed to specify reward functions as manually designed or learned reward structures whose integrations in the RL algorithms are claimed to significantly improve the learning efficiency. Manually designed reward structures can suffer from inaccuracy and existing automatically learning methods are often computationally intractable for complex tasks. The integration of inaccurate or partial reward structures in RL algorithms fail to learn optimal policies. In this work, we propose an RL algorithm that can automatically structure the reward function for sample efficiency, given a set of labels that signify subtasks. Given such minimal knowledge about the task, we train a high-level policy that selects optimal sub-tasks in each state together with a low-level policy that efficiently learns to complete each sub-task. We evaluate our algorithm in a variety of sparse-reward environments. The experiment results show that our approach significantly outperforms the state-of-art baselines as the difficulty of the task increases.
- Hindsight Experience Replay. 5048–5058.
- Spatio-temporal abstractions in reinforcement learning through neural encoding. (2016).
- libalf: The automata learning framework. In Proceedings of the 22nd International Conference on Computer Aided Verification. 360–364.
- Exploration in Reward Machines with Low Regret. In International Conference on Artificial Intelligence and Statistics, Vol. 206. 4114–4146.
- Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning. In 2020 IEEE International Conference on Robotics and Automation. 10349–10355.
- LTLf/LDLf Non-Markovian Rewards. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 1771–1778.
- LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. 6065–6073.
- Goal-conditioned reinforcement learning with imagined subgoals. In International Conference on Machine Learning. PMLR, 1430–1440.
- Toward A Thousand Lights: Decentralized Deep Reinforcement Learning for Large-Scale Traffic Signal Control. In The Thirty-Fourth AAAI Conference on Artificial Intelligence. 3414–3421.
- Autotelic agents with intrinsically motivated goal-conditioned reinforcement learning: a short survey. Journal of Artificial Intelligence Research 74 (2022), 1159–1199.
- Reinforcement learning with stochastic reward machines. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 6429–6436.
- Multi-Agent Intention Progression with Reward Machines. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. 215–222.
- Reinforcement-Learning-Based Robust Resource Management for Multi-Radio Systems. Sensors 23, 10 (2023), 4821.
- Thomas G Dietterich. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of artificial intelligence research 13 (2000), 227–303.
- Generalizing goal-conditioned reinforcement learning with variational causal reasoning. Advances in Neural Information Processing Systems 35 (2022), 26532–26548.
- Contrastive learning as goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 35 (2022), 35603–35620.
- Curriculum-guided hindsight experience replay. Advances in neural information processing systems 32 (2019).
- Induction of subgoal automata for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 3890–3897.
- A survey on interpretable reinforcement learning. arXiv preprint arXiv:2112.13112 (2021).
- E. Mark Gold. 1978. Complexity of Automaton Identification from Given Data. Inf. Control. 37, 3 (1978), 302–320.
- Survey of Reinforcement Learning based on Human Prior Knowledge. Journal of Uncertain Systems 15, 01 (2022), 2230001.
- Unpacking reward shaping: Understanding the benefits of reward engineering on sample complexity. Advances in Neural Information Processing Systems 35 (2022), 15281–15295.
- DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning. In Thirty-Fifth AAAI Conference on Artificial Intelligence. 7647–7656.
- Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees. In 58th IEEE Conference on Decision and Control. 5338–5343.
- Explainability in deep reinforcement learning. Knowledge-Based Systems 214 (2021), 106685.
- Rodrigo Toro Icarte. 2022. Reward Machines. Ph.D. Dissertation. University of Toronto, Canada.
- Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80. 2112–2121.
- Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning. J. Artif. Intell. Res. 73 (2022), 173–208.
- Learning concise models from long execution traces. In 57th ACM/IEEE Design Automation Conference (DAC). 1–6.
- Multi-Agent Reinforcement Learning for Traffic Signal Control through Universal Communication Method. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. 3854–3860.
- Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks. In Thirty-Fifth AAAI Conference on Artificial Intelligence1. 7995–8003.
- Creativity of ai: Automatic symbolic option discovery for facilitating deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 7042–7050.
- Predicting chemical structure using reinforcement learning with a stack-augmented conditional variational autoencoder. J. Cheminformatics 14, 1 (2022), 83.
- Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems 29 (2016).
- Goal-conditioned reinforcement learning: Problems and solutions. arXiv preprint arXiv:2201.08299 (2022).
- SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 2970–2977.
- George H Mealy. 1955. A method for synthesizing sequential circuits. The Bell System Technical Journal 34, 5 (1955), 1045–1079.
- Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.
- Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning 16, 1 (2023), 1–118.
- Reward Machines for Cooperative Multi-Agent Reinforcement Learning. In Proceedings of the 2021 International Conference on Autonomous Agents and Multiagent Systems. 934–942.
- Daniel Neider and Nils Jansen. 2013. Regular model checking using solver technologies and automata learning. In NASA Formal Methods Symposium. Springer, 16–31.
- José Oncina and Pedro Garcia. 1992. Inferring regular languages in polynomial updated time. In Pattern recognition and image analysis: selected papers from the IVth Spanish Symposium. World Scientific, 49–61.
- Ronald Parr and Stuart Russell. 1997. Reinforcement learning with hierarchies of machines. Advances in neural information processing systems 10 (1997).
- Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys (CSUR) 54, 5 (2021), 1–35.
- Amir Pnueli. 1977. The Temporal Logic of Programs. In 18th Annual Symposium on Foundations of Computer Science, Providence. 46–57.
- Erika Puiutta and Eric MSP Veith. 2020. Explainable reinforcement learning: A survey. In International cross-domain conference for machine learning and knowledge extraction. 77–95.
- Haoyuan Sun and Feng Wu. 2023. Less Is More: Refining Datasets for Offline Reinforcement Learning with Reward Machines. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. 1239–1247.
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence 112, 1-2 (1999), 181–211.
- Curiosity in exploring chemical spaces: intrinsic rewards for molecular reinforcement learning. Mach. Learn. Sci. Technol. 3, 3 (2022), 35008.
- Learning reward machines for partially observable reinforcement learning. Advances in neural information processing systems 32 (2019).
- Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning 8 (1992), 279–292.
- Joint Inference of Reward Machines and Policies for Reinforcement Learning. In Proceedings of the Thirtieth International Conference on Automated Planning and Scheduling. 590–598.
- Reinforcement Learning with Knowledge Representation and Reasoning: A Brief Survey. CoRR abs/2304.12090 (2023).
- Goal space abstraction in hierarchical reinforcement learning via set-based reachability analysis. In 2023 IEEE International Conference on Development and Learning (ICDL). IEEE, 423–428.
- Reinforcement Learning for Radio Resource Management in RAN Slicing: A Survey. IEEE Commun. Mag. 61, 2 (2023), 118–124.
- Shuai Han (26 papers)
- Mehdi Dastani (27 papers)
- Shihan Wang (15 papers)