Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition (1811.07629v1)

Published 19 Nov 2018 in eess.AS and cs.SD

Abstract: In this work, we present an analysis of a DNN-based autoencoder for speech enhancement, dereverberation and denoising. The target application is a robust speaker verification (SV) system. We start our approach by carefully designing a data augmentation process to cover wide range of acoustic conditions and obtain rich training data for various components of our SV system. We augment several well-known databases used in SV with artificially noised and reverberated data and we use them to train a denoising autoencoder (mapping noisy and reverberated speech to its clean version) as well as an x-vector extractor which is currently considered as state-of-the-art in SV. Later, we use the autoencoder as a preprocessing step for text-independent SV system. We compare results achieved with autoencoder enhancement, multi-condition PLDA training and their simultaneous use. We present a detailed analysis with various conditions of NIST SRE 2010, 2016, PRISM and with re-transmitted data. We conclude that the proposed preprocessing can significantly improve both i-vector and x-vector baselines and that this technique can be used to build a robust SV system for various target domains.

Authors (5)

Ondrej Novotny (30 papers)
Oldrich Plchot (80 papers)
Ondrej Glembek (8 papers)
Jan "Honza" Cernocky (24 papers)
Lukas Burget (164 papers)

Citations (40)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition (1811.07629v1)

Summary

Related Papers