Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Context, Attention and Audio Feature Explorations for Audio Visual Scene-Aware Dialog (1812.08407v1)

Published 20 Dec 2018 in cs.CL

Abstract: With the recent advancements in AI, Intelligent Virtual Assistants (IVA) have become a ubiquitous part of every home. Going forward, we are witnessing a confluence of vision, speech and dialog system technologies that are enabling the IVAs to learn audio-visual groundings of utterances and have conversations with users about the objects, activities and events surrounding them. As a part of the 7th Dialog System Technology Challenges (DSTC7), for Audio Visual Scene-Aware Dialog (AVSD) track, We explore `topics' of the dialog as an important contextual feature into the architecture along with explorations around multimodal Attention. We also incorporate an end-to-end audio classification ConvNet, AclNet, into our models. We present detailed analysis of the experiments and show that some of our model variations outperform the baseline system presented for this task.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Shachi H Kumar (17 papers)
  2. Eda Okur (20 papers)
  3. Saurav Sahay (34 papers)
  4. Juan Jose Alvarado Leanos (2 papers)
  5. Jonathan Huang (46 papers)
  6. Lama Nachman (27 papers)
Citations (10)