Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community (2404.01158v1)

Published 1 Apr 2024 in cs.CL and cs.RO

Abstract: The ability to interact with machines using natural human language is becoming not just commonplace, but expected. The next step is not just text interfaces, but speech interfaces and not just with computers, but with all machines including robots. In this paper, we chronicle the recent history of this growing field of spoken dialogue with robots and offer the community three proposals, the first focused on education, the second on benchmarks, and the third on the modeling of language when it comes to spoken interaction with robots. The three proposals should act as white papers for any researcher to take and build upon.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. VQA: Visual question answering. In Proceedings of the IEEE International Conference on Computer Vision.
  2. Appdia: A discourse-aware transformer-based style transfer model for offensive social media conversations. arXiv preprint arXiv:2209.08207.
  3. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, pages 610–623, New York, NY, USA. Association for Computing Machinery.
  4. Samuel R Bowman and George Dahl. 2021. What will it take to fix benchmarking in natural language understanding? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4843–4855, Online. Association for Computational Linguistics.
  5. PaLM: Scaling language modeling with pathways.
  6. Socially cognizant robotics for a technology enhanced society. arXiv preprint arXiv:2310.18303.
  7. Visual dialog. IEEE Trans. Pattern Anal. Mach. Intell., 41(5).
  8. BERT: Pre-training of deep bidirectional transformers for language understanding.
  9. An empirical study of training end-to-end vision-and-language transformers. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18166–18176. IEEE.
  10. PaLM-E: An embodied multimodal language model.
  11. Maxine Eskenazi and Tiancheng Zhao. 2020. Report from the NSF future directions workshop, toward User-Oriented agents: Research directions and challenges. arXiv.
  12. Physically grounded Vision-Language models for robotic manipulation.
  13. Alexa arena: A User-Centric interactive platform for embodied AI.
  14. Sabit Hassan and Malihe Alikhani. 2023a. D-calm: A dynamic clustering-based active learning approach for mitigating bias. arXiv e-prints, pages arXiv–2305.
  15. Sabit Hassan and Malihe Alikhani. 2023b. Discgen: A framework for discourse-informed counterspeech generation.
  16. Distilling Step-by-Step! outperforming larger language models with less training data and smaller model sizes.
  17. BabyBERTa: Learning more grammar with Small-Scale Child-Directed language. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 624–646, Online. Association for Computational Linguistics.
  18. Modeling intensification for sign language generation: A computational approach. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2897–2911.
  19. Towards incremental transformers: An empirical analysis of transformer models for incremental NLU. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1178–1189, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  20. Casey Kennington. 2021a. Natural language processing for computer scientists and data scientists at a large state university. In Proceedings of the Fifth Workshop on Teaching NLP, pages 115–124, Online. Association for Computational Linguistics.
  21. Casey Kennington. 2021b. The state of SLIVAR: What’s next for robots, human-robot interaction, and (spoken) dialogue systems?
  22. Soda: Million-scale dialogue distillation with social commonsense contextualization. arXiv preprint arXiv:2212.10465.
  23. The fourth dialog state tracking challenge. In Dialogues with Social Robots, pages 435–449. Springer.
  24. RoBERTa: A robustly optimized BERT pretraining approach.
  25. ViLBERT: Pretraining Task-Agnostic visiolinguistic representations for Vision-and-Language tasks.
  26. Spoken language interaction with robots: Recommendations for future research. Comput. Speech Lang., 71:101255.
  27. FILM: Following instructions in language with modular methods.
  28. Real-time emotion generation in human-robot dialogue using large language models. Front Robot AI, 10:1271610.
  29. TEACh: Task-driven embodied agents that chat.
  30. Anna Rogers. 2023. Closed AI models make bad baselines. https://hackingsemantics.xyz/2023/closed-baselines/. Accessed: 2023-4-27.
  31. Toolformer: Language models can teach themselves to use tools.
  32. David Schlangen. 2003. A coherence-based approach to the interpretation of non-sentential utterances in dialogue.
  33. ALFRED: A benchmark for interpreting grounded instructions for everyday tasks.
  34. Anthony Sicilia and Malihe Alikhani. 2022. Leather: A framework for learning to generate human-like text in dialogue. arXiv preprint arXiv:2210.07777.
  35. Anthony Sicilia and Malihe Alikhani. 2023. Learning to generate equitable text in dialogue from biased training data. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2898–2917.
  36. Isabel: An inclusive and collaborative task-oriented dialogue system. In Alexa Prize TaskBot Challenge 2 Proceedings.
  37. ProgPrompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11523–11530.
  38. Eyes wide shut? exploring the visual shortcomings of multimodal llms. arXiv preprint arXiv:2401.06209.
  39. Attention is all you need. Adv. Neural Inf. Process. Syst., 30.
  40. Including facial expressions in contextual embeddings for sign language generation. arXiv preprint arXiv:2202.05383.
  41. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355.
  42. ONE-PEACE: Exploring one general representation model toward unlimited modalities. arXiv preprint arXiv:2305. 11172.
  43. Findings of the 2023 BabyLM Challenge: Sample-efficient pretraining on developmentally plausible corpora. In Proceedings of the 2023 BabyLM Challenge. Association for Computational Linguistics (ACL).
  44. Sociotechnical safety evaluation of generative ai systems. arXiv preprint arXiv:2310.11986.
  45. Jason D Williams and Menlo Park. 2016. The dialog state tracking challenge series : A review. Dialogue & Discourse, 7(3):4–33.
  46. Voice in the machine: Ethical considerations for language-capable robots. Communications of the ACM, 66(8):20–23.
  47. Scarecrows in Oz: The use of large language models in HRI. ACM Transactions on Human-Robot Interaction, 13(1).
  48. Scarecrows in oz: The use of large language models in HRI. J. Hum.-Robot Interact., 13(1):1–11.
  49. Including signed languages in natural language processing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7347–7360.
  50. Building cooperative embodied agents modularly with large language models.

Summary

We haven't generated a summary for this paper yet.