Improving the performance of Learned Controllers in Behavior Trees using Value Function Estimates at Switching Boundaries (2305.18903v3)
Abstract: Behavior trees represent a modular way to create an overall controller from a set of sub-controllers solving different sub-problems. These sub-controllers can be created in different ways, such as classical model based control or reinforcement learning (RL). If each sub-controller satisfies the preconditions of the next sub-controller, the overall controller will achieve the overall goal. However, even if all sub-controllers are locally optimal in achieving the preconditions of the next, with respect to some performance metric such as completion time, the overall controller might be far from optimal with respect to the same performance metric. In this paper we show how the performance of the overall controller can be improved if we use approximations of value functions to inform the design of a sub-controller of the needs of the next one. We also show how, under certain assumptions, this leads to a globally optimal controller when the process is executed on all sub-controllers. Finally, this result also holds when some of the sub-controllers are already given, i.e., if we are constrained to use some existing sub-controllers the overall controller will be globally optimal given this constraint.
- M. Iovino, E. Scukins, J. Styrud, P. Ögren, and C. Smith, “A survey of behavior trees in robotics and ai,” Robotics and Autonomous Systems, vol. 154, p. 104096, 2022.
- R. d. P. Pereira and P. M. Engel, “A framework for constrained and adaptive behavior-based agents,” arXiv preprint arXiv:1506.02312, 2015.
- D. Isla, “Handling Complexity in the Halo 2 AI,” in Proceedings of the Game Developers Conference (GDC), 2005.
- O. Biggar, M. Zamani, and I. Shames, “On modularity in reactive control architectures, with an application to formal verification,” ACM Transactions on Cyber-Physical Systems (TCPS), May 2022.
- R. R. Burridge, A. A. Rizzi, and D. E. Koditschek, “Sequential Composition of Dynamically Dexterous Robot Behaviors,” The International Journal of Robotics Research, vol. 18, no. 6, pp. 534–555, June 1999.
- J. Erskine and C. Lehnert, “Developing cooperative policies for multi-stage reinforcement learning tasks,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 6590–6597, 2022.
- P. Ögren and C. I. Sprague, “Behavior Trees in Robot Control Systems,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 5, no. 1, 2022.
- R. Dey and C. Child, “Ql-bt: Enhancing behaviour tree design and implementation with q-learning,” in 2013 IEEE Conference on Computational Inteligence in Games (CIG). IEEE, 2013, pp. 1–8.
- B. Hannaford, D. Hu, D. Zhang, and Y. Li, “Simulation results on selector adaptation in behavior trees,” arXiv preprint arXiv:1606.09219, 2016.
- Y. Fu, L. Qin, and Q. Yin, “A reinforcement learning behavior tree framework for game ai,” in Proceedings of the 2016 International Conference on Economics, Social Science, Arts, Education and Management Engineering, 2016.
- Q. Zhang, L. Sun, P. Jiao, and Q. Yin, “Combining behavior trees with maxq learning to facilitate cgfs behavior modeling,” in 2017 4th International Conference on Systems and Informatics (ICSAI). IEEE, 2017, pp. 525–531.
- R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,” Artificial Intelligence, vol. 112, no. 1-2, pp. 181–211, Aug. 1999.
- Y. Lee, S.-H. Sun, S. Somasundaram, E. S. Hu, and J. J. Lim, “Composing complex skills by learning transition policies,” in International Conference on Learning Representations, 2018.
- T. G. Dietterich, “Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition,” Journal of Artificial Intelligence Research, vol. 13, pp. 227–303, Nov. 2000.
- A. Juliani, V.-P. Berges, E. Teng, A. Cohen, J. Harper, C. Elion, C. Goy, Y. Gao, H. Henry, M. Mattar, and D. Lange, “Unity: A general platform for intelligent agents,” arXiv preprint arXiv:1809.02627, 2020.
- O. Biggar, M. Zamani, and I. Shames, “On modularity in reactive control architectures, with an application to formal verification,” arXiv preprint arXiv:2008.12515, 2020.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,” arXiv, Aug. 2017.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.