Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

iPhonMatchNet: Zero-Shot User-Defined Keyword Spotting Using Implicit Acoustic Echo Cancellation (2309.06096v3)

Published 12 Sep 2023 in eess.AS and eess.SP

Abstract: In response to the increasing interest in human--machine communication across various domains, this paper introduces a novel approach called iPhonMatchNet, which addresses the challenge of barge-in scenarios, wherein user speech overlaps with device playback audio, thereby creating a self-referencing problem. The proposed model leverages implicit acoustic echo cancellation (iAEC) techniques to increase the efficiency of user-defined keyword spotting models, achieving a remarkable 95% reduction in mean absolute error with a minimal increase in model size (0.13%) compared to the baseline model, PhonMatchNet. We also present an efficient model structure and demonstrate its capability to learn iAEC functionality without requiring a clean signal. The findings of our study indicate that the proposed model achieves competitive performance in real-world deployment conditions of smart devices.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. “Ica-based efficient blind dereverberation and echo cancellation method for barge-in-able robot audition,” in 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2009, pp. 3677–3680.
  2. “Development of a robot quizmaster with auditory functions for speech-based multiparty interaction,” in 2014 IEEE/SICE International Symposium on System Integration. IEEE, 2014, pp. 328–333.
  3. “A study for improving device-directed speech detection toward frictionless human-machine interaction.,” in INTERSPEECH, 2019, pp. 3342–3346.
  4. “Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 7310–7314.
  5. “Acoustic echo canceller with high speech quality,” in ICASSP’87. IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 1987, vol. 12, pp. 2125–2128.
  6. “Study of the general kalman filter for echo cancellation,” IEEE transactions on audio, speech, and language processing, vol. 21, no. 8, pp. 1539–1549, 2013.
  7. “Deep learning for joint acoustic echo and noise cancellation with nonlinear distortions.,” in Interspeech, 2019, pp. 4255–4259.
  8. “Acoustic Echo Cancellation with the Dual-Signal Transformation LSTM Network,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 7138–7142.
  9. “Low-complexity acoustic echo cancellation with neural kalman filtering,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
  10. “Icassp 2021 acoustic echo cancellation challenge: Datasets, testing framework, and results,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 151–155.
  11. “Icassp 2022 acoustic echo cancellation challenge,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 9107–9111.
  12. “Implicit acoustic echo cancellation for keyword spotting and device-directed speech detection,” in 2022 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2023, pp. 1052–1058.
  13. “Query-by-example keyword spotting using long short-term memory networks,” in Proc. ICASSP 2015, 2015, pp. 5236–5240.
  14. “Donut: Ctc-based query-by-example keyword spotting,” NeurIPS 2018 - Workshop on Interpretability and Robustness in Audio, Speech, and Language (IRASL), 2018.
  15. “Query-by-example keyword spotting system using multi-head attention and soft-triple loss,” in Proc. ICASSP 2021, 2021, pp. 6858–6862.
  16. “Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting,” in Proc. Interspeech 2022, 2022, pp. 1871–1875.
  17. “PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords,” in Proc. INTERSPEECH 2023, 2023, pp. 3964–3968.
  18. “Flexible keyword spotting based on homogeneous audio-text embedding,” arXiv preprint arXiv:2308.06472, 2023.
  19. “Training keyword spotters with limited and synthesized speech data,” in Proc. ICASSP 2020, 2020, pp. 7474–7478.
  20. Pete Warden, “Speech commands: A dataset for limited-vocabulary speech recognition,” arXiv preprint arXiv:1804.03209, 2018.
  21. “Query-by-example on-device keyword spotting,” in 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019, pp. 532–538.
  22. “MUSAN: A Music, Speech, and Noise Corpus,” 2015, arXiv:1510.08484v1.
  23. “A scalable noisy speech dataset and online subjective test framework,” Proc. Interspeech 2019, pp. 1816–1820, 2019.
  24. “Pyroomacoustics: A python package for audio room simulation and array processing algorithms,” in 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018, pp. 351–355.
Citations (1)

Summary

We haven't generated a summary for this paper yet.