2000 character limit reached
Asymmetric and trial-dependent modeling: the contribution of LIA to SdSV Challenge Task 2 (2403.19634v1)
Published 28 Mar 2024 in cs.SD, cs.CL, and eess.AS
Abstract: The SdSv challenge Task 2 provided an opportunity to assess efficiency and robustness of modern text-independent speaker verification systems. But it also made it possible to test new approaches, capable of taking into account the main issues of this challenge (duration, language, ...). This paper describes the contributions of our laboratory to the speaker recognition field. These contributions highlight two other challenges in addition to short-duration and language: the mismatch between enroLLMent and test data and the one between subsets of the evaluation trial dataset. The proposed approaches experimentally show their relevance and efficiency on the SdSv evaluation, and could be of interest in many real-life applications.
- H. Zeinali, H. Sameti, and T. Stafylakis, “DeepMine speech processing database: Text-dependent and independent speaker verification and speech recognition in Persian and English.” in Speaker and Language Recognition Workshop (IEEE Odyssey), 2018, pp. 386–392.
- H. Zeinali, L. Burget, and J. Cernocky, “A multi purpose and large scale speech corpus in Persian and English for speaker and speech recognition: the DeepMine database,” in Proc. ASRU 2019 The 2019 IEEE Automatic Speech Recognition and Understanding Workshop, 2019.
- K. A. Zeinali, Hossein nad Lee, J. Alam, and L. Burget, “Short-duration speaker verification (SdSV) challenge 2020: the challenge evaluation plan.” arXiv preprint arXiv:1912.06311, Tech. Rep., 2020.
- J. Alam, G. Bhattacharya, and P. Kenny, “Speaker verification in mismatched conditions with frustratingly easy domain adaptation,” in Speaker and Language Recognition Workshop (IEEE Odyssey), 2018.
- P.-M. Bousquet and M. Rouvier, “On Robustness of Unsupervised Domain Adaptation for Speaker Recognition,” in Proc. Interspeech 2019, 2019, pp. 2958–2962. [Online]. Available: http://dx.doi.org/10.21437/Interspeech.2019-1524
- K. A. Lee, Q. Wang, and T. Koshinaka, “The CORAL+ algorithm for unsupervised domain adaptation of PLDA,” CoRR, vol. abs/1812.10260, 2018. [Online]. Available: http://arxiv.org/abs/1812.10260
- D. Garcia-Romero and A. McCree, “Supervised domain adaptation for i-vector based speaker recognition,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 2014, pp. 4047–4051.
- H. Aronowitz, “Inter dataset variability compensation for speaker recognition,” 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4002–4006, 2014.
- J. A. V. López and E. Lleida, “Bayesian adaptation of PLDA based speaker recognition to domains with scarce development data,” in Speaker and Language Recognition Workshop (IEEE Odyssey), 2012.
- D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur, “X-vectors: Robust dnn embeddings for speaker recognition,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 5329–5333.
- J. S. Chung, A. Nagrani, and A. Zisserman, “VoxCeleb2: Deep speaker recognition,” in Proc. Interspeech 2018, 2018, pp. 1086–1090.
- V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An ASR corpus based on public domain audio books,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206–5210.
- P.-M. Bousquet and M. Rouvier, “Duration mismatch compensation using four-covariance model and deep neural network for speaker verification,” in Proc. Interspeech 2017, 2017, pp. 1547–1551. [Online]. Available: http://dx.doi.org/10.21437/Interspeech.2017-93
- P. Rajan, A. Afanasyev, V. Hautamäki, and T. Kinnunen, “From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification,” Digit. Signal Process., vol. 31, pp. 93–101, 2014.
- S. J. Prince and J. H. Elder, “Probabilistic linear discriminant analysis for inferences about identity,” in International Conference on Computer Vision. IEEE, 2007, pp. 1–8.
- D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, “The kaldi speech recognition toolkit,” in IEEE Workshop on Automatic Speech Recognition & Understanding. IEEE Signal Processing Society, Dec. 2011, iEEE Catalog No.: CFP11SRW-USB.