Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
86 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
53 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Multi-Domain Adaptation by Self-Supervised Learning for Speaker Verification (2309.14149v1)

Published 25 Sep 2023 in cs.SD and eess.AS

Abstract: In real-world applications, speaker recognition models often face various domain-mismatch challenges, leading to a significant drop in performance. Although numerous domain adaptation techniques have been developed to address this issue, almost all present methods focus on a simple configuration where the model is trained in one domain and deployed in another. However, real-world environments are often complex and may contain multiple domains, making the methods designed for one-to-one adaptation suboptimal. In our paper, we propose a self-supervised learning method to tackle this multi-domain adaptation problem. Building upon the basic self-supervised adaptation algorithm, we designed three strategies to make it suitable for multi-domain adaptation: an in-domain negative sampling strategy, a MoCo-like memory bank scheme, and a CORAL-like distribution alignment. We conducted experiments using VoxCeleb2 as the source domain dataset and CN-Celeb1 as the target multi-domain dataset. Our results demonstrate that our method clearly outperforms the basic self-supervised adaptation method, which simply treats the data of CN-Celeb1 as a single domain. Importantly, the improvement is consistent in nearly all in-domain tests and cross-domain tests, demonstrating the effectiveness of our proposed method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. “A review on speaker recognition: Technology and challenges,” Computers & Electrical Engineering, vol. 90, pp. 107005, 2021.
  2. “X-vectors: Robust DNN embeddings for speaker recognition,” in ICASSP. IEEE, 2018, pp. 5329–5333.
  3. “The 2021 NIST speaker recognition evaluation,” arXiv preprint arXiv:2204.10242, 2022.
  4. “VoxSRC 2022: The fourth voxceleb speaker recognition challenge,” arXiv preprint arXiv:2302.10248, 2023.
  5. “Self-supervised learning based domain adaptation for robust speaker verification,” in ICASSP. IEEE, 2021, pp. 5834–5838.
  6. “Cluster-guided unsupervised domain adaptation for deep speaker embedding,” IEEE Signal Processing Letters, 2023.
  7. “CN-Celeb: a challenging Chinese speaker recognition dataset,” in ICASSP. IEEE, 2020, pp. 7604–7608.
  8. “VoxCeleb: Large-scale speaker verification in the wild,” Computer Speech & Language, vol. 60, pp. 101027, 2020.
  9. “Unsupervised clustering approaches for domain adaptation in speaker recognition systems,” Odyssey, 2014.
  10. “EDITnet: A lightweight network for unsupervised domain adaptation in speaker verification,” arXiv preprint arXiv:2206.07548, 2022.
  11. “Class-aware distribution alignment based unsupervised domain adaptation for speaker verification,” in INTERSPEECH, 2022.
  12. “Adversarial training for multi-domain speaker recognition,” in ISCSLP. IEEE, 2021, pp. 1–5.
  13. “Domain robust deep embedding learning for speaker recognition,” in ICASSP. IEEE, 2022, pp. 7182–7186.
  14. “The CORAL+ algorithm for unsupervised domain adaptation of PLDA,” in ICASSP. IEEE, 2019, pp. 5821–5825.
  15. “The CORAL++ algorithm for unsupervised domain adaptation of speaker recognition,” in ICASSP. IEEE, 2022, pp. 7172–7176.
  16. “Reducing domain mismatch by maximum mean discrepancy based autoencoders.,” in Odyssey, 2018, pp. 162–167.
  17. “Multi-level deep neural network adaptation for speaker verification using mmd and consistency regularization,” in ICASSP. IEEE, 2020, pp. 6839–6843.
  18. “Deep coral: Correlation alignment for deep domain adaptation,” in Computer Vision–ECCV 2016 Workshops. Springer, 2016, pp. 443–450.
  19. “Speaker verification using end-to-end adversarial language adaptation,” in ICASSP. IEEE, 2019, pp. 6006–6010.
  20. “Generative adversarial speaker embedding networks for domain robust end-to-end speaker verification,” in ICASSP. IEEE, 2019, pp. 6226–6230.
  21. “Contrastive domain adaptation via delimitation discriminator,” in ICASSP. IEEE, 2023, pp. 1–5.
  22. “Contrastive self-supervised learning for text-independent speaker verification,” in ICASSP. IEEE, 2021, pp. 6713–6717.
  23. “Label-efficient self-supervised speaker verification with information maximization and contrastive learning,” arXiv preprint arXiv:2207.05506, 2022.
  24. “Learning speaker representations with mutual information,” arXiv preprint arXiv:1812.00271, 2018.
  25. “A simple framework for contrastive learning of visual representations,” in International conference on machine learning. PMLR, 2020, pp. 1597–1607.
  26. “Momentum contrast for unsupervised visual representation learning,” in CVPR, 2020, pp. 9729–9738.
  27. “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
  28. “Musan: A music, speech, and noise corpus,” arXiv preprint arXiv:1510.08484, 2015.
  29. “A study on data augmentation of reverberant speech for robust speech recognition,” in ICASSP. IEEE, 2017, pp. 5220–5224.
  30. “ECAPA-TDNN: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification,” arXiv preprint arXiv:2005.07143, 2020.
  31. “Margin matters: Towards more discriminative deep neural network embeddings for speaker recognition,” in APSIPA ASC. IEEE, 2019, pp. 1652–1656.
  32. Laurens Van der Maaten and Geoffrey Hinton, “Visualizing data using t-SNE,” Journal of machine learning research, vol. 9, no. 11, 2008.
Citations (1)

Summary

We haven't generated a summary for this paper yet.