2000 character limit reached
Improving Context Modelling in Multimodal Dialogue Generation (1810.11955v1)
Published 20 Oct 2018 in cs.CL
Abstract: In this work, we investigate the task of textual response generation in a multimodal task-oriented dialogue system. Our work is based on the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017) in the fashion domain. We introduce a multimodal extension to the Hierarchical Recurrent Encoder-Decoder (HRED) model and show that this extension outperforms strong baselines in terms of text-based similarity metrics. We also showcase the shortcomings of current vision and LLMs by performing an error analysis on our system's output.
- Shubham Agarwal (34 papers)
- Ioannis Konstas (40 papers)
- Verena Rieser (58 papers)
- Ondrej Dusek (7 papers)