Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling (1908.05067v1)

Published 14 Aug 2019 in cs.CL and cs.CV

Abstract: Visual question answering and visual dialogue tasks have been increasingly studied in the multimodal field towards more practical real-world scenarios. A more challenging task, audio visual scene-aware dialogue (AVSD), is proposed to further advance the technologies that connect audio, vision, and language, which introduces temporal video information and dialogue interactions between a questioner and an answerer. This paper proposes an intuitive mechanism that fuses features and attention in multiple stages in order to well integrate multimodal features, and the results demonstrate its capability in the experiments. Also, we apply several state-of-the-art models in other tasks to the AVSD task, and further analyze their generalization across different tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yi-Ting Yeh (12 papers)
  2. Tzu-Chuan Lin (2 papers)
  3. Hsiao-Hua Cheng (1 paper)
  4. Yu-Hsuan Deng (1 paper)
  5. Shang-Yu Su (20 papers)
  6. Yun-Nung Chen (104 papers)
Citations (16)