Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations (2104.08667v2)

Published 18 Apr 2021 in cs.CL and cs.AI

Abstract: Next generation task-oriented dialog systems need to understand conversational contexts with their perceived surroundings, to effectively help users in the real-world multimodal environment. Existing task-oriented dialog datasets aimed towards virtual assistance fall short and do not situate the dialog in the user's multimodal context. To overcome, we present a new dataset for Situated and Interactive Multimodal Conversations, SIMMC 2.0, which includes 11K task-oriented user<->assistant dialogs (117K utterances) in the shopping domain, grounded in immersive and photo-realistic scenes. The dialogs are collected using a two-phase pipeline: (1) A novel multimodal dialog simulator generates simulated dialog flows, with an emphasis on diversity and richness of interactions, (2) Manual paraphrasing of the generated utterances to collect diverse referring expressions. We provide an in-depth analysis of the collected dataset, and describe in detail the four main benchmark tasks we propose. Our baseline model, powered by the state-of-the-art LLM, shows promising results, and highlights new challenges and directions for the community to study.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Satwik Kottur (19 papers)
  2. Seungwhan Moon (28 papers)
  3. Alborz Geramifard (22 papers)
  4. Babak Damavandi (11 papers)
Citations (82)