Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HLT-NUS Submission for NIST 2019 Multimedia Speaker Recognition Evaluation (2010.03905v1)

Published 8 Oct 2020 in eess.AS and cs.SD

Abstract: This work describes the speaker verification system developed by Human Language Technology Laboratory, National University of Singapore (HLT-NUS) for 2019 NIST Multimedia Speaker Recognition Evaluation (SRE). The multimedia research has gained attention to a wide range of applications and speaker recognition is no exception to it. In contrast to the previous NIST SREs, the latest edition focuses on a multimedia track to recognize speakers with both audio and visual information. We developed separate systems for audio and visual inputs followed by a score level fusion of the systems from the two modalities to collectively use their information. The audio systems are based on x-vector based speaker embedding, whereas the face recognition systems are based on ResNet and InsightFace based face embeddings. With post evaluation studies and refinements, we obtain an equal error rate (EER) of 0.88% and an actual detection cost function (actDCF) of 0.026 on the evaluation set of 2019 NIST multimedia SRE corpus.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Rohan Kumar Das (50 papers)
  2. Ruijie Tao (25 papers)
  3. Jichen Yang (28 papers)
  4. Wei Rao (33 papers)
  5. Cheng Yu (62 papers)
  6. Haizhou Li (286 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.