Solving Dialogue Grounding Embodied Task in a Simulated Environment using Further Masked Language Modeling
Abstract: Enhancing AI systems with efficient communication skills that align with human understanding is crucial for their effective assistance to human users. Proactive initiatives from the system side are needed to discern specific circumstances and interact aptly with users to solve these scenarios. In this research, we opt for a collective building assignment taken from the Minecraft dataset. Our proposed method employs language modeling to enhance task understanding through state-of-the-art (SOTA) methods using LLMs. These models focus on grounding multi-modal understandinging and task-oriented dialogue comprehension tasks. This focus aids in gaining insights into how well these models interpret and respond to a variety of inputs and tasks. Our experimental results provide compelling evidence of the superiority of our proposed method. This showcases a substantial improvement and points towards a promising direction for future research in this domain.
- Publicly available clinical BERT embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pages 72–78, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
- TOUCHDOWN: natural language navigation and spatial reasoning in visual street environments. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019.
- Just ask: An interactive learning framework for vision and language navigation. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI.
- Guesswhat?! visual object discovery through multi-modal dialogue. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
- Don’t stop pretraining: Adapt language models to domains and tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8342–8360, Online. Association for Computational Linguistics.
- Representation learning for grounded spatial reasoning. Transactions of the Association for Computational Linguistics, 6.
- Learning to execute instructions in a Minecraft dialogue. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
- Draw me a flower: Grounding formal abstract structures stated in informal natural language. arXiv preprint arXiv:2106.14321.
- ALBERT: A lite BERT for self-supervised learning of language representations. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
- BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Brielen Madureira and David Schlangen. 2023. " are you telling me to put glasses on the dog?”content-grounded annotation of instruction clarification requests in the codraw dataset. arXiv preprint arXiv:2306.02377.
- Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
- Mapping instructions and visual observations to actions with reinforcement learning. In EMNLP.
- Collaborative dialogue in Minecraft. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
- Khanh Nguyen and Hal Daumé III. 2019. Help, anna! visual navigation with natural multimodal assistance via retrospective curiosity-encouraging imitation learning. In EMNLP-IJCNLP.
- GloVe: Global vectors for word representation. In EMNLP.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8).
- Rmm: A recursive mental model for dialogue navigation. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1732–1745.
- Zhengxaing Shi and Aldo Lipani. 2023. Don’t stop pretraining? make prompt-based fine-tuning powerful learner. arXiv preprint arXiv:2305.01711.
- Learning to execute actions or ask clarification questions. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2060–2070, Seattle, United States. Association for Computational Linguistics.
- When and what to ask through world states and text instructions: Iglu nlp challenge solution. arXiv preprint arXiv:2305.05754.
- Rethinking semi-supervised learning with language models. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada. Association for Computational Linguistics.
- Stepgame: A new benchmark for robust multi-hop spatial reasoning in texts. In Association for the Advancement of Artificial Intelligence.
- Alane Suhr and Yoav Artzi. 2018. Situated mapping of sequential instructions to actions with single-step reward observation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
- Executing instructions in situated collaborative interactions. In EMNLP-IJCNLP.
- Vision-and-dialog navigation. In Conference on Robot Learning. PMLR.
- Robust and interpretable grounding of spatial references with relation networks. In Findings of the Association for Computational Linguistics: EMNLP 2020.
- Self-motivated communication agent for real-world vision-dialog navigation. In Proceedings of the IEEE/CVF International Conference on Computer Vision.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.