ExTraCT -- Explainable Trajectory Corrections from language inputs using Textual description of features (2401.03701v1)
Abstract: Natural language provides an intuitive and expressive way of conveying human intent to robots. Prior works employed end-to-end methods for learning trajectory deformations from language corrections. However, such methods do not generalize to new initial trajectories or object configurations. This work presents ExTraCT, a modular framework for trajectory corrections using natural language that combines LLMs for natural language understanding and trajectory deformation functions. Given a scene, ExTraCT generates the trajectory modification features (scene-specific and scene-independent) and their corresponding natural language textual descriptions for the objects in the scene online based on a template. We use LLMs for semantic matching of user utterances to the textual descriptions of features. Based on the feature matched, a trajectory modification function is applied to the initial trajectory, allowing generalization to unseen trajectories and object configurations. Through user studies conducted both in simulation and with a physical robot arm, we demonstrate that trajectories deformed using our method were more accurate and were preferred in about 80\% of cases, outperforming the baseline. We also showcase the versatility of our system in a manipulation task and an assistive feeding task.
- “Learning from physical human corrections, one feature at a time” In Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, 2018, pp. 141–149
- “Learning robot objectives from physical human interaction” In Conference on Robot Learning, 2017, pp. 217–226 PMLR
- “Feature expansive reward learning: Rethinking human input” In Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction, 2021, pp. 216–224
- “Real-time natural language corrections for assistive robotic manipulators” In The International Journal of Robotics Research 36.5-7 SAGE Publications Sage UK: London, England, 2017, pp. 684–698
- “Language Models are Few-Shot Learners” In Advances in Neural Information Processing Systems 33 Curran Associates, Inc., 2020, pp. 1877–1901 URL: https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
- “Reshaping robot trajectories using natural language commands: A study of multi-modal data alignment using transformers” In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 978–984 IEEE
- “LATTE: LAnguage Trajectory TransformEr” In 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 7287–7294 DOI: 10.1109/ICRA48891.2023.10161068
- “PaLM: Scaling Language Modeling with Pathways” arXiv, 2022 DOI: 10.48550/ARXIV.2204.02311
- “No, to the Right: Online Language Corrections for Robotic Manipulation via Shared Autonomy”, HRI ’23 Stockholm, Sweden: Association for Computing Machinery, 2023, pp. 93–101 DOI: 10.1145/3568162.3578623
- “Bert: Pre-training of deep bidirectional transformers for language understanding” In arXiv preprint arXiv:1810.04805, 2018
- “Mask r-cnn” In Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961–2969
- “Voxposer: Composable 3d value maps for robotic manipulation with language models” In arXiv preprint arXiv:2307.05973, 2023
- “Learning preferences for manipulation tasks from online coactive feedback” In The International Journal of Robotics Research 34.10 SAGE Publications Sage UK: London, England, 2015, pp. 1296–1313
- “Code as policies: Language model programs for embodied control” In 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 9493–9500 IEEE
- “Grounding dino: Marrying dino with grounded pre-training for open-set object detection” In arXiv preprint arXiv:2303.05499, 2023
- “Simple open-vocabulary object detection” In European Conference on Computer Vision, 2022, pp. 728–755 Springer
- “Learning transferable visual models from natural language supervision” In International conference on machine learning, 2021, pp. 8748–8763 PMLR
- “Yolov3: An incremental improvement” In arXiv preprint arXiv:1804.02767, 2018
- “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing Association for Computational Linguistics, 2019 URL: https://arxiv.org/abs/1908.10084
- “Motion planning with sequential convex optimization and convex collision checking” In The International Journal of Robotics Research 33.9 SAGE Publications Sage UK: London, England, 2014, pp. 1251–1270
- “Correcting robot plans with natural language feedback” In arXiv preprint arXiv:2204.05186, 2022
- “Learning from interventions: Human-robot interaction as both explicit and implicit feedback” In 16th Robotics: Science and Systems, RSS 2020, 2020 MIT Press Journals
- “Robots that use language” In Annual Review of Control, Robotics, and Autonomous Systems 3 Annual Reviews, 2020, pp. 25–55
- “MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers”, 2020 arXiv:2002.10957 [cs.CL]
- “Language to Rewards for Robotic Skill Synthesis” In arXiv preprint arXiv:2306.08647, 2023
- “Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections” In 2nd Workshop on Language and Robot Learning: Language as Grounding, 2023
- J-Anne Yow (3 papers)
- Neha Priyadarshini Garg (1 paper)
- Manoj Ramanathan (2 papers)
- Wei Tech Ang (12 papers)