Meta-AF Echo Cancellation for Improved Keyword Spotting (2312.10605v1)
Abstract: Adaptive filters (AFs) are vital for enhancing the performance of downstream tasks, such as speech recognition, sound event detection, and keyword spotting. However, traditional AF design prioritizes isolated signal-level objectives, often overlooking downstream task performance. This can lead to suboptimal performance. Recent research has leveraged meta-learning to automatically learn AF update rules from data, alleviating the need for manual tuning when using simple signal-level objectives. This paper improves the Meta-AF framework by expanding it to support end-to-end training for arbitrary downstream tasks. We focus on classification tasks, where we introduce a novel training methodology that harnesses self-supervision and classifier feedback. We evaluate our approach on the combined task of acoustic echo cancellation and keyword spotting. Our findings demonstrate consistent performance improvements with both pre-trained and joint-trained keyword spotting models across synthetic and real playback. Notably, these improvements come without requiring additional tuning, increased inference-time complexity, or reliance on oracle signal-level training data.
- “Meta-AF: Meta-Learning for Adaptive Filters,” IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2022.
- “Speech Processing for Digital Home Assistants: Combining Signal Processing With Deep-Learning Techniques,” IEEE Signal Processing Magazine (SPM), 2019.
- Adaptive Signal Processing, Prentice-Hall, 1985.
- V. John Mathews, “Adaptive Polynomial Filters,” IEEE Signal Processing Magazine (SPM), 1991.
- Simon S. Haykin, Adaptive Filter Theory, Pearson, 2008.
- “NICE-Beam: Neural Integrated Covariance Estimators for Time-Varying Beamformers,” arXiv:2112.04613, 2021.
- “Deep Adaptation Control for Acoustic Echo Cancellation,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022.
- “End-To-End Deep Learning-Based Adaptation Control for Frequency-Domain Adaptive System Identification,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022.
- “Auto-DSP: Learning to Optimize Acoustic Echo Cancellers,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021.
- “Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition,” IEEE Transactions on Speech and Audio Processing (TSAP), 2004.
- “Phase-Based Dual-Microphone Speech Enhancement Using a Prior Speech Model,” IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2006.
- “Closely Coupled Array Processing and Model-Based Compensation for Microphone Array Speech Recognition,” IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2007.
- “Beamnet: End-To-End Training of a Beamformer-Supported Multi-Channel Asr System,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017.
- “Unified Architecture for Multichannel End-To-End Speech Recognition With Neural Beamforming,” IEEE Journal of Selected Topics in Signal Processing (JSTSP), 2017.
- “End-To-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party,” IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2022.
- “End-To-End Dereverberation, Beamforming, and Speech Recognition With Improved Numerical Stability and Advanced Frontend,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.
- “Device-directed Utterance Detection,” Interspeech, 2018.
- “A Neural Acoustic Echo Canceller Optimized Using an Automatic Speech Recognizer and Large Scale Synthetic Data,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.
- “Task Splitting for Dnn-Based Acoustic Echo and Noise Removal,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2022.
- “Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection,” in Workshop on Spoken Language Technology (SLT). IEEE, 2023.
- Pete Warden, “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition,” arXiv:1804.03209, 2018.
- “ICASSP 2022 Acoustic Echo Cancellation Challenge,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
- “Meta-Learning for Adaptive Filters with Higher-Order Frequency Dependencies,” in IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), 2022.
- “State-Space Architecture of the Partitioned-Block-Based Acoustic Echo Controller,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014.
- “Multidelay Block Frequency Domain Adaptive Filter,” IEEE Transactions on Acoustics, Speech, and Signal Processing (TASSP), 1990.
- “Learning to Learn by Gradient Descent by Gradient Descent,” in NeurIPS, 2016.
- Advances in Network and Acoustic Echo Cancellation, Springer, 2001.