Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Active Speakers in Context (2005.09812v1)

Published 20 May 2020 in cs.CV, cs.SD, and eess.AS

Abstract: Current methods for active speak er detection focus on modeling short-term audiovisual information from a single speaker. Although this strategy can be enough for addressing single-speaker scenarios, it prevents accurate detection when the task is to identify who of many candidate speakers are talking. This paper introduces the Active Speaker Context, a novel representation that models relationships between multiple speakers over long time horizons. Our Active Speaker Context is designed to learn pairwise and temporal relations from an structured ensemble of audio-visual observations. Our experiments show that a structured feature ensemble already benefits the active speaker detection performance. Moreover, we find that the proposed Active Speaker Context improves the state-of-the-art on the AVA-ActiveSpeaker dataset achieving a mAP of 87.1%. We present ablation studies that verify that this result is a direct consequence of our long-term multi-speaker analysis.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Fabian Caba Heilbron (34 papers)
  2. Long Mai (32 papers)
  3. Federico Perazzi (22 papers)
  4. Joon-Young Lee (61 papers)
  5. Pablo Arbelaez (79 papers)
  6. Bernard Ghanem (256 papers)
  7. Juan Leon Alcazar (5 papers)
Citations (58)

Summary

We haven't generated a summary for this paper yet.