Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Long-Horizon Imitation Through Instruction Prediction (2306.12554v1)

Published 21 Jun 2023 in cs.LG and cs.AI

Abstract: Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents. Difficulties in such settings are exacerbated in low data regimes where over-fitting stifles generalization and compounding errors hurt accuracy. In this work, we explore the use of an often unused source of auxiliary supervision: language. Inspired by recent advances in transformer-based models, we train agents with an instruction prediction loss that encourages learning temporally extended representations that operate at a high level of abstraction. Concretely, we demonstrate that instruction modeling significantly improves performance in planning environments when training with a limited number of demonstrations on the BabyAI and Crafter benchmarks. In further analysis we find that instruction modeling is most important for tasks that require complex reasoning, while understandably offering smaller gains in environments that require simple plans. More details and code can be found at https://github.com/jhejna/instruction-prediction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Grounding Language to Autonomously-Acquired Skills via Goal Generation. In ICLR 2021.
  2. Unsupervised state representation learning in atari. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, 8769–8782.
  3. Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  4. Modular multitask reinforcement learning with policy sketches. In International Conference on Machine Learning, 166–175. PMLR.
  5. Learning with Latent Language. In NAACL-HLT.
  6. Learning to Understand Goal Specifications by Modelling Reward. In International Conference on Learning Representations.
  7. ACTRCE: Augmenting Experience via Teacher’s Advice For Multi-Goal Reinforcement Learning. 1st Workshop on Goal Specifications for Reinforcement Learning, Workshop held jointly at ICML, IJCAI, AAMAS.
  8. Gated-attention architectures for task-oriented language grounding. In Thirty-Second AAAI Conference on Artificial Intelligence.
  9. Touchdown: Natural language navigation and spatial reasoning in visual street environments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12538–12547.
  10. Topological Planning with Transformers for Vision-and-Language Navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11276–11286.
  11. Decision transformer: Reinforcement learning via sequence modeling. arXiv preprint arXiv:2106.01345.
  12. Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning. In International Conference on Learning Representations.
  13. An Empirical Investigation of Representation Learning for Imitation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  14. BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop. In International Conference on Learning Representations.
  15. Higher: Improving instruction following with hindsight generation for experience replay. In 2020 IEEE Symposium Series on Computational Intelligence (SSCI), 225–232. IEEE.
  16. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
  17. A tale of two explanations: Enhancing human trust by explaining robot behavior. Science Robotics, 4(37): eaay4663.
  18. Feeling the force: Integrating force and pose for fluent discovery through imitation learning to open medicine bottles. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 3530–3537. IEEE.
  19. Speaker-follower models for vision-and-language navigation. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 3318–3329.
  20. From language to goals: Inverse reinforcement learning for vision-based instruction following. arXiv preprint arXiv:1902.07742.
  21. Language and thought. Cambridge University Press.
  22. Zero-shot Task Adaptation using Natural Language. arXiv preprint arXiv:2106.02972.
  23. Using natural language for reward shaping in reinforcement learning. arXiv preprint arXiv:1903.02020.
  24. Grounded language learning in a simulated 3d world. arXiv preprint arXiv:1706.06551.
  25. Human instruction-following with deep reinforcement learning via transfer-learning from text. arXiv preprint arXiv:2005.09382.
  26. Hierarchical Decision Making by Generating and Following Natural Language Instructions. Advances in Neural Information Processing Systems, 32: 10025–10034.
  27. Reinforcement learning with unsupervised auxiliary tasks. International Conference on Learning Representations.
  28. Language as an Abstraction for Hierarchical Deep Reinforcement Learning. Advances in Neural Information Processing Systems, 32: 9419–9431.
  29. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2): 99–134.
  30. Following instructions by imagining and reaching visual goals. arXiv preprint arXiv:2001.09373.
  31. Beyond the nav-graph: Vision-and-language navigation in continuous environments. In European Conference on Computer Vision, 104–120. Springer.
  32. Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, 5639–5650. PMLR.
  33. VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7005–7015.
  34. A survey of reinforcement learning informed by natural language. arXiv preprint arXiv:1906.03926.
  35. ELLA: Exploration through Learned Language Abstraction. arXiv preprint arXiv:2103.05825.
  36. Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation. Conference on Robot Learning (CoRL).
  37. Zero-shot task generalization with multi-task deep reinforcement learning. In International Conference on Machine Learning, 2661–2670. PMLR.
  38. Stabilizing transformers for reinforcement learning. In International Conference on Machine Learning, 7487–7498. PMLR.
  39. Episodic transformer for vision-and-language navigation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 15942–15952.
  40. Improving language understanding by generative pre-training.
  41. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, 627–635. JMLR Workshop and Conference Proceedings.
  42. Data-Efficient Reinforcement Learning with Self-Predictive Representations. In International Conference on Learning Representations.
  43. CLIPort: What and Where Pathways for Robotic Manipulation. In 5th Annual Conference on Robot Learning.
  44. ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  45. ALFWorld: Aligning Text and Embodied Environments for Interactive Learning. In Proceedings of the International Conference on Learning Representations (ICLR).
  46. Hierarchical and interpretable skill acquisition in multi-task reinforcement learning. arXiv preprint arXiv:1712.07294.
  47. Language-Conditioned Imitation Learning for Robot Manipulation Tasks. Advances in Neural Information Processing Systems, 33.
  48. Decoupling representation learning from reinforcement learning. In International Conference on Machine Learning, 9870–9879. PMLR.
  49. Attention is All you Need. In Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  50. Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning. arXiv preprint arXiv:2101.07393.
  51. Wightman, R. 2019. PyTorch Image Models. https://github.com/rwightman/pytorch-image-models.
  52. DD-PPO: Learning near-perfect pointgoal navigators from 2.5 billion frames. arXiv preprint arXiv:1911.00357.
  53. Learning to parse natural language to grounded reward functions with weak supervision. In 2018 IEEE International Conference on Robotics and Automation (ICRA), 4430–4436. IEEE.
  54. Deep reinforcement learning with relational inductive biases. In International Conference on Learning Representations.
  55. Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation. In EMNLP.
Citations (6)

Summary

We haven't generated a summary for this paper yet.