Learning from Symmetry: Meta-Reinforcement Learning with Symmetrical Behaviors and Language Instructions (2209.10656v2)
Abstract: Meta-reinforcement learning (meta-RL) is a promising approach that enables the agent to learn new tasks quickly. However, most meta-RL algorithms show poor generalization in multi-task scenarios due to the insufficient task information provided only by rewards. Language-conditioned meta-RL improves the generalization capability by matching language instructions with the agent's behaviors. While both behaviors and language instructions have symmetry, which can speed up human learning of new knowledge. Thus, combining symmetry and language instructions into meta-RL can help improve the algorithm's generalization and learning efficiency. We propose a dual-MDP meta-reinforcement learning method that enables learning new tasks efficiently with symmetrical behaviors and language instructions. We evaluate our method in multiple challenging manipulation tasks, and experimental results show that our method can greatly improve the generalization and learning efficiency of meta-reinforcement learning. Videos are available at https://tumi6robot.wixsite.com/symmetry/.
- C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in International conference on machine learning. PMLR, 2017, pp. 1126–1135.
- K. Rakelly, A. Zhou, C. Finn, S. Levine, and D. Quillen, “Efficient off-policy meta-reinforcement learning via probabilistic context variables,” in International conference on machine learning. PMLR, 2019, pp. 5331–5340.
- M. Wang, Z. Bing, X. Yao, S. Wang, H. Su, C. Yang, K. Huang, and A. Knoll, “Meta-reinforcement learning based on self-supervised task representation learning,” 2023.
- Z. Bing, Y. Meng, Y. Yun, H. Su, X. Su, K. Huang, and A. Knoll, “Diva: A dirichlet process based incremental deep clustering algorithm via variational auto-encoder,” 2023.
- Z. Bing, D. Lerch, K. Huang, and A. Knoll, “Meta-reinforcement learning in non-stationary and dynamic environments,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3476–3491, 2023.
- T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and S. Levine, “Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning,” in Conference on robot learning. PMLR, 2020, pp. 1094–1100.
- J. D. Co-Reyes, A. Gupta, S. Sanjeev, N. Altieri, J. DeNero, P. Abbeel, and S. Levine, “Guiding policies with language via meta-learning,” in International Conference on Learning Representations, 2019.
- C. B. Fisher, K. Ferdinandsen, and M. H. Bornstein, “The role of symmetry in infant form discrimination,” Child Development, vol. 52, no. 2, pp. 457–462, 1981.
- M. H. Pornstein and S. J. Krinsky, “Perception of symmetry in infancy: The salience of vertical symmetry and the perception of pattern wholes,” Journal of Experimental Child Psychology, 1985.
- A. Zhou, T. Knowles, and C. Finn, “Meta-learning symmetries by reparameterization,” in International Conference on Learning Representations, 2021.
- L. Kirsch, S. Flennerhag, H. van Hasselt, A. Friesen, J. Oh, and Y. Chen, “Introducing symmetries to black box meta reinforcement learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 7, 2022, pp. 7202–7210.
- Z. Bing, H. Zhou, R. Li, X. Su, F. O. Morin, K. Huang, and A. Knoll, “Solving robotic manipulation with sparse reward reinforcement learning via graph-based diversity and proximity,” IEEE Transactions on Industrial Electronics, vol. 70, no. 3, pp. 2759–2769, 2023.
- Z. Bing, M. Brucker, F. O. Morin, R. Li, X. Su, K. Huang, and A. Knoll, “Complex robotic manipulation via graph-based hindsight goal generation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pp. 7863–7876, 2022.
- Z. Bing, A. Koch, X. Yao, K. Huang, and A. Knoll, “Meta-reinforcement learning via language instructions.” in 2023 IEEE International Conference on Robotics and Automation(ICRA), May 2023. [Online]. Available: https://arxiv.org/abs/2209.04924
- P. Goyal, S. Niekum, and R. Mooney, “Pixl2r: Guiding reinforcement learning using natural language by mapping pixels to rewards,” in Conference on Robot Learning. PMLR, 2021, pp. 485–497.
- E. Jang, A. Irpan, M. Khansari, D. Kappler, F. Ebert, C. Lynch, S. Levine, and C. Finn, “Bc-z: Zero-shot task generalization with robotic imitation learning,” in Conference on Robot Learning. PMLR, 2022, pp. 991–1002.
- M. Shridhar, L. Manuelli, and D. Fox, “Cliport: What and where pathways for robotic manipulation,” in Conference on Robot Learning. PMLR, 2022, pp. 894–906.
- H. Zhou, Z. Bing, X. Yao, X. Su, C. Yang, K. Huang, and A. Knoll, “Language-conditioned imitation learning with base skill priors under unstructured data,” 2023.
- M. Zinkevich and T. Balch, “Symmetry in markov decision processes and its implications for single agent and multi agent learning,” in In Proceedings of the 18th International Conference on Machine Learning. Citeseer, 2001.
- A. Agostini and E. Celaya, “Exploiting domain symmetries in reinforcement learning with continuous state and action spaces,” in 2009 International Conference on Machine Learning and Applications. IEEE, 2009, pp. 331–336.
- E. van der Pol, D. Worrall, H. van Hoof, F. Oliehoek, and M. Welling, “Mdp homomorphic networks: Group symmetries in reinforcement learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 4199–4210, 2020.
- Y. Lin, J. Huang, M. Zimmer, Y. Guan, J. Rojas, and P. Weng, “Invariant transform experience replay: Data augmentation for deep reinforcement learning,” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6615–6622, 2020.
- Z. Bing, D. Lerch, K. Huang, and A. Knoll, “Meta-reinforcement learning in non-stationary and dynamic environments,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–17, 2022.
- E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2012, pp. 5026–5033.
- V. Joshi, M. Peters, and M. Hopkins, “Extending a parser to distant domains using a few dozen partially annotated examples,” in Association for Computational Linguistics, Melbourne, Australia, July 2018.
- G. A. Miller, “Wordnet: a lexical database for english,” Communications of the ACM, vol. 38, no. 11, pp. 39–41, 1995.
- J. Fu, K. Luo, and S. Levine, “Learning robust rewards with adverserial inverse reinforcement learning,” in International Conference on Learning Representations, 2018.
- A. Zhou, E. Jang, D. Kappler, A. Herzog, M. Khansari, P. Wohlhart, Y. Bai, M. Kalakrishnan, S. Levine, and C. Finn, “Watch, try, learn: Meta-learning from demonstrations and rewards,” in International Conference on Learning Representations, 2020.
- Xiangtong Yao (13 papers)
- Zhenshan Bing (39 papers)
- Genghang Zhuang (5 papers)
- Kejia Chen (19 papers)
- Hongkuan Zhou (23 papers)
- Kai Huang (146 papers)
- Alois Knoll (190 papers)