Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dual-modality seq2seq network for audio-visual event localization (1902.07473v2)

Published 20 Feb 2019 in cs.CV

Abstract: Audio-visual event localization requires one to identify theevent which is both visible and audible in a video (eitherat a frame or video level). To address this task, we pro-pose a deep neural network named Audio-Visual sequence-to-sequence dual network (AVSDN). By jointly taking bothaudio and visual features at each time segment as inputs, ourproposed model learns global and local event information ina sequence to sequence manner, which can be realized in ei-ther fully supervised or weakly supervised settings. Empiricalresults confirm that our proposed method performs favorablyagainst recent deep learning approaches in both settings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yan-Bo Lin (11 papers)
  2. Yu-Jhe Li (23 papers)
  3. Yu-Chiang Frank Wang (88 papers)
Citations (118)

Summary

We haven't generated a summary for this paper yet.