LINGO-Space: Language-Conditioned Incremental Grounding for Space (2402.01183v1)
Abstract: We aim to solve the problem of spatially localizing composite instructions referring to space: space grounding. Compared to current instance grounding, space grounding is challenging due to the ill-posedness of identifying locations referred to by discrete expressions and the compositional ambiguity of referring expressions. Therefore, we propose a novel probabilistic space-grounding methodology (LINGO-Space) that accurately identifies a probabilistic distribution of space being referred to and incrementally updates it, given subsequent referring expressions leveraging configurable polar distributions. Our evaluations show that the estimation using polar distributions enables a robot to ground locations successfully through $20$ table-top manipulation benchmark tests. We also show that updating the distribution helps the grounding method accurately narrow the referring space. We finally demonstrate the robustness of the space grounding with simulated manipulation and real quadruped robot navigation tasks. Code and videos are available at https://lingo-space.github.io.
- Do as i can, not as i say: Grounding language in robotic affordances. In Proceedings of the Conference on Robot Learning (CoRL), 287โ318. PMLR.
- PyBullet, a Python module for physics simulation for games, robotics and machine learning. http://pybullet.org.
- Language to logical form with neural attention. In Proceedings of the Association for Computational Linguistics (ACL), 33โ43.
- Google scanned objects: A high-quality dataset of 3D scanned household items. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2553โ2560. IEEE.
- PaLM-E: An embodied multimodal language model. In Proceedings of the International Conference on Machine Learning (ICML). PMLR.
- Energy-based models are zero-shot planners for compositional scene rearrangement. In Proceedings of Robotics: Science and Systems (RSS).
- Grounding spatial relations for human-robot interaction. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1640โ1647. IEEE.
- Interactively picking real-world objects with unconstrained spoken language instructions. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 3774โ3781. IEEE.
- Mask r-cnn. In Proceedings of the International Conference on Computer Vision (ICCV), 2961โ2969.
- An intelligence architecture for grounded language communication with field robots. Field Robotics, 468โ512.
- A natural language planner interface for mobile manipulators. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 6652โ6659. IEEE.
- Strategies for pre-training graph neural networks. In Proceedings of the International Conference on Learning Representation (ICLR).
- Inner monologue: Embodied reasoning through planning with language models. In Proceedings of the Conference on Robot Learning (CoRL), 1769โ1782. PMLR.
- Bottom up top down detection transformers for language grounding in images and point clouds. In Proceedings of the European Conference on Computer Vision (ECCV), 417โ433. Springer.
- Ground then navigate: Language-guided navigation in dynamic scenes. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 4113โ4120. IEEE.
- Representing spatial object relations as parametric polar distribution for scene manipulation based on verbal commands. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 8373โ8380. IEEE.
- SGGNet22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Speech-scene graph grounding network for speech-guided navigation. In Proceedings of the IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 1648โ1654. IEEE.
- Code as policies: Language model programs for embodied control. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 9493โ9500. IEEE.
- Lang2LTL: Translating natural language commands to temporal robot task specification. In The Workshop on Language and Robotics at Conference on robot learning.
- Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499.
- A joint model of language and perception for grounded attribute learning. In Proceedings of the International Conference on Machine Learning (ICML), 1435โ1442.
- Grounding language with visual affordances over unstructured data. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 11576โ11582. IEEE.
- Learning object placements for relational instructions by hallucinating scene representations. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 94โ100. IEEE.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1): 99โ106.
- Learning neuro-symbolic programs for language guided robot manipulation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 7973โ7980. IEEE.
- OpenAI. 2023. ChatGPT (Aug 14 version). https://chat.openai.com/chat. Large language model.
- Efficient grounding of abstract spatial concepts for natural language interaction with robot platforms. International Journal of Robotics Research, 37(10): 1269โ1299.
- Predicting stable configurations for semantic placement of novel objects. In Proceedings of the Conference on Robot Learning (CoRL), 806โ815. PMLR.
- Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning (ICML), 8748โ8763. PMLR.
- Recipe for a general, powerful, scalable graph transformer. Conference on Neural Information Processing Systems (NeurIPS), 35: 14501โ14515.
- Leveraging language for accelerated learning of tool manipulation. In Proceedings of the Conference on Robot Learning (CoRL), 1531โ1541. PMLR.
- Leveraging past references for robust language grounding. In Proceedings of the Association for Computational Linguistics (ACL), 430โ440.
- LM-Nav: Robotic navigation with large pre-trained models of language, vision, and action. In Proceedings of the Conference on Robot Learning (CoRL), 492โ504. PMLR.
- Cliport: What and where pathways for robotic manipulation. In Proceedings of the Conference on Robot Learning (CoRL), 894โ906. PMLR.
- INGRESS: Interactive visual grounding of referring expressions. International Journal of Robotics Research, 39(2-3): 217โ232.
- Progprompt: Generating situated robot task plans using large language models. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 11523โ11530. IEEE.
- Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the International Conference on Computer Vision (ICCV), 2998โ3009.
- Utilizing spatial relations for natural language access to an autonomous mobile robot. In Proceedings of the German Annual Conference on Artificial Intelligence, 39โ50. Springer.
- Grounding spatial relations in natural language by fuzzy representation for human-robot interaction. In Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 1743โ1750. IEEE.
- Understanding natural language commands for robotic navigation and mobile manipulation. In Proceedings of the National Conference on Artificial Intelligence (AAAI), volumeย 25, 1507โ1514. AAAI Press.
- Attention is all you need. Conference on Neural Information Processing Systems (NeurIPS), 30.
- Spatial reasoning from natural language instructions for robot manipulation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 11196โ11202. IEEE.
- Transporter networks: Rearranging the visual world for robotic manipulation. In Proceedings of the Conference on Robot Learning (CoRL), 726โ747. PMLR.
- Differentiable parsing and visual grounding of natural language instructions for object placement. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 11546โ11553. IEEE.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.