Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity Detection (2004.01546v1)

Published 2 Apr 2020 in eess.AS, cs.LG, cs.SD, and stat.ML

Abstract: This paper presents a novel framework for Speech Activity Detection (SAD). Inspired by the recent success of multi-task learning approaches in the speech processing domain, we propose a novel joint learning framework for SAD. We utilise generative adversarial networks to automatically learn a loss function for joint prediction of the frame-wise speech/ non-speech classifications together with the next audio segment. In order to exploit the temporal relationships within the input signal, we propose a temporal discriminator which aims to ensure that the predicted signal is temporally consistent. We evaluate the proposed framework on multiple public benchmarks, including NIST OpenSAT' 17, AMI Meeting and HAVIC, where we demonstrate its capability to outperform state-of-the-art SAD approaches. Furthermore, our cross-database evaluations demonstrate the robustness of the proposed approach across different languages, accents, and acoustic environments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Tharindu Fernando (44 papers)
  2. Sridha Sridharan (106 papers)
  3. Mitchell McLaren (11 papers)
  4. Darshana Priyasad (4 papers)
  5. Simon Denman (74 papers)
  6. Clinton Fookes (148 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.