Learning a Hierarchical Planner from Humans in Multiple Generations (2310.11614v1)
Abstract: A typical way in which a machine acquires knowledge from humans is by programming. Compared to learning from demonstrations or experiences, programmatic learning allows the machine to acquire a novel skill as soon as the program is written, and, by building a library of programs, a machine can quickly learn how to perform complex tasks. However, as programs often take their execution contexts for granted, they are brittle when the contexts change, making it difficult to adapt complex programs to new contexts. We present natural programming, a library learning system that combines programmatic learning with a hierarchical planner. Natural programming maintains a library of decompositions, consisting of a goal, a linguistic description of how this goal decompose into sub-goals, and a concrete instance of its decomposition into sub-goals. A user teaches the system via curriculum building, by identifying a challenging yet not impossible goal along with linguistic hints on how this goal may be decomposed into sub-goals. The system solves for the goal via hierarchical planning, using the linguistic hints to guide its probability distribution in proposing the right plans. The system learns from this interaction by adding newly found decompositions in the successful search into its library. Simulated studies and a human experiment (n=360) on a controlled environment demonstrate that natural programming can robustly compose programs learned from different users and contexts, adapting faster and solving more complex tasks when compared to programmatic baselines.
- Modular multitask reinforcement learning with policy sketches. In International Conference on Machine Learning, 166–175. PMLR.
- Grounding natural language instructions to semantic goal representations for abstraction and generalization. Autonomous Robots, 43: 449–468.
- Accurately and efficiently interpreting human-robot instructions of varying granularities. arXiv preprint arXiv:1704.06616.
- Quickest change detection approach to optimal control in Markov decision processes with model changes. In 2017 American control conference (ACC), 399–405. IEEE.
- Verifiable reinforcement learning via policy extraction. Advances in neural information processing systems, 31.
- Programming with angelic nondeterminism. In Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL ’10, 339–352. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-60558-479-9.
- The double-edged sword of pedagogy: Instruction limits spontaneous exploration and discovery. Cognition, 120(3): 322–330.
- Leveraging grammar and reinforcement learning for neural program synthesis. arXiv preprint arXiv:1805.04276.
- Choi, P.-M. 2000. Reinforcement learning in nonstationary environments. Hong Kong University of Science and Technology (Hong Kong).
- Learning modular neural network policies for multi-task and multi-robot transfer. In 2017 IEEE International Conference on Robotics and Automation (ICRA), 2169–2176.
- Dietterich, T. G. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of artificial intelligence research, 13: 227–303.
- Dreamcoder: Growing generalizable, interpretable knowledge with wake-sleep bayesian program learning. arXiv preprint arXiv:2006.08381.
- Iris: A conversational agent for complex tasks. In Proceedings of the 2018 CHI conference on human factors in computing systems, 1–12.
- From language to goals: Inverse reinforcement learning for vision-based instruction following. arXiv preprint arXiv:1902.07742.
- Gabriel, I. 2020. Artificial intelligence, values, and alignment. Minds and machines, 30(3): 411–437.
- An investigation of model-free planning. arXiv:1901.03559.
- Contextual markov decision processes. arXiv preprint arXiv:1502.02259.
- Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. arXiv preprint arXiv:2201.07207.
- Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:2303.00855.
- Synthesizing programmatic policies that inductively generalize. In 8th International Conference on Learning Representations.
- Kaelbling, L. P. 1993. Learning to achieve goals. In IJCAI, volume 2, 1094–8. Citeseer.
- Hierarchical task and motion planning in the now. In 2011 IEEE International Conference on Robotics and Automation, 1470–1477. ISSN: 1050-4729.
- Learning adaptive language interfaces through decomposition. arXiv preprint arXiv:2010.05190.
- From skills to symbols: Learning symbolic representations for abstract high-level planning. Journal of Artificial Intelligence Research, 61: 215–289.
- Competition-level code generation with alphacode. Science, 378(6624): 1092–1097.
- Code as policies: Language model programs for embodied control. arXiv preprint arXiv:2209.07753.
- Adaptive variance for changing sparse-reward environments. In 2019 International Conference on Robotics and Automation (ICRA), 3210–3216. IEEE.
- Learning to communicate about shared procedural abstractions. In Proceedings of the 43rd Annual Conference of the Cognitive Science Society.
- Dynamic walking on compliant and uneven terrain using DCM and passivity-based whole-body control. In 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), 25–32. IEEE.
- Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602.
- Padakandla, S. 2021. A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments. ACM Comput. Surv., 54(6).
- Tips for creating a block language with blockly. In 2017 IEEE Blocks and Beyond Workshop (B&B), 21–24.
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
- TACO: Learning Task Decomposition via Temporal Alignment for Control. In Dy, J.; and Krause, A., eds., Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, 4654–4663. PMLR.
- Learning symbolic operators for task and motion planning. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 3182–3189. IEEE.
- Learning Symbolic Operators for Task and Motion Planning. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 3182–3189. ISSN: 2153-0866.
- PDDL Planning with Pretrained Large Language Models. In NeurIPS 2022 Foundation Models for Decision Making Workshop.
- Grounding English commands to reward functions. In Robotics: Science and Systems.
- Executing instructions in situated collaborative interactions. arXiv preprint arXiv:1910.03655.
- How to talk so AI will learn: Instructions, descriptions, and autonomy. Advances in Neural Information Processing Systems, 35: 34762–34775.
- The cultural evolution of language. Current Opinion in Psychology, 8: 37–43.
- Growing knowledge culturally across generations to solve novel, complex tasks. arXiv preprint arXiv:2107.13377.
- Learning to synthesize programs as interpretable and generalizable policies. Advances in neural information processing systems, 34: 25146–25163.
- ChatGPT for Robotics: Design Principles and Model Abilities. Technical Report MSR-TR-2023-8, Microsoft.
- Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code. In The Third Wordplay: When Language Meets Games Workshop.
- Naturalizing a Programming Language via Interactive Learning. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 929–938.
- Learning Language Games through Interaction. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2368–2378.
- Building a semantic parser overnight. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1332–1342.
- Leveraging language to learn program abstractions and search heuristics. In International Conference on Machine Learning, 11193–11204. PMLR.
- Robust policy learning over multiple uncertainty sets. In International Conference on Machine Learning, 24414–24429. PMLR.
- Program synthesis guided reinforcement learning for partially observed environments. Advances in neural information processing systems, 34: 29669–29683.
- Learning calibratable policies using programmatic style-consistency. In International Conference on Machine Learning, 11001–11011. PMLR.