Application of frozen large-scale models to multimodal task-oriented dialogue (2310.00845v1)

Published 2 Oct 2023 in cs.CL and cs.AI

Abstract: In this study, we use the existing LLMs ENnhanced to See Framework (LENS Framework) to test the feasibility of multimodal task-oriented dialogues. The LENS Framework has been proposed as a method to solve computer vision tasks without additional training and with fixed parameters of pre-trained models. We used the Multimodal Dialogs (MMD) dataset, a multimodal task-oriented dialogue benchmark dataset from the fashion field, and for the evaluation, we used the ChatGPT-based G-EVAL, which only accepts textual modalities, with arrangements to handle multimodal data. Compared to Transformer-based models in previous studies, our method demonstrated an absolute lift of 10.8% in fluency, 8.8% in usefulness, and 5.2% in relevance and coherence. The results show that using large-scale models with fixed parameters rather than using models trained on a dataset from scratch improves performance in multimodal task-oriented dialogues. At the same time, we show that LLMs are effective for multimodal task-oriented dialogues. This is expected to lead to efficient applications to existing systems.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (5)

Tatsuki Kawamoto (2 papers)
Takuma Suzuki (1 paper)
Ko Miyama (1 paper)
Takumi Meguro (1 paper)
Tomohiro Takagi (8 papers)

Application of frozen large-scale models to multimodal task-oriented dialogue (2310.00845v1)

Related Papers