Modified Parametric Multichannel Wiener Filter \\for Low-latency Enhancement of Speech Mixtures with Unknown Number of Speakers (2306.17317v1)
Abstract: This paper introduces a novel low-latency online beamforming (BF) algorithm, named Modified Parametric Multichannel Wiener Filter (Mod-PMWF), for enhancing speech mixtures with unknown and varying number of speakers. Although conventional BFs such as linearly constrained minimum variance BF (LCMV BF) can enhance a speech mixture, they typically require such attributes of the speech mixture as the number of speakers and the acoustic transfer functions (ATFs) from the speakers to the microphones. When the mixture attributes are unavailable, estimating them by low-latency processing is challenging, hindering the application of the BFs to the problem. In this paper, we overcome this problem by modifying a conventional Parametric Multichannel Wiener Filter (PMWF). The proposed Mod-PMWF can adaptively form a directivity pattern that enhances all the speakers in the mixture without explicitly estimating these attributes. Our experiments will show the proposed BF's effectiveness in interference reduction ratios and subjective listening tests.
- “Beamforming: A versatile approach to spatial filtering” In IEEE ASSP Magazine 5.2, 1988, pp. 4–24
- H.L.V. Trees “Optimum Array Processing, Part IV of Detection, Estimation, and Modulation Theory” New York: Wiley-Interscience, 2002
- “Multichannel Signal Ennhancement Algorithms for Assisted Listening Devices” In IEEE Signal Processing Magazine 32.2, 2015, pp. 18–30
- “A robust adaptive binaural beamformer for hearing devices” In Proc. Asilomar Conference on Signals, Systems, and Computers, 2017
- S. Gannot, D. Burshtein and E. Weinstein “Signal Enhancement Using Beamforming and Non-Stationarity with Applications to Speech” In IEEE Trans. Signal Processing 49.8, 2001, pp. 1614–1626
- S. Markovich-Golan, S. Gannot and I. Cohen “Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals” In IEEE Trans. ASLP 17.6, 2009, pp. 1071–1086
- M. Souden, J. Benesty and S. Affes “On optimal frequency-domain multichannel linear filtering for noise reduction” In IEEE Trans. Audio, Speech, and Language Processing 18.2, 2007, pp. 260–276
- “Multichannel end-to-end speech recognition” In Proc. International conference on machine learning, 2017, pp. 2632–2641
- “Derivative constraints for broadband element space antenna array processors” In IEEE Transactions on Acoustics, Speech, Signal Processing 31.6, 1983, pp. 1378–1393
- Ofer Schwartz, Sharon Gannot and Emanuel A.P. Habets “Multispeaker LCMV Beamformer and Postfilter for Source Separation and Noise Reduction” In IEEE/ACM Trans. Audio, Speech, and Language Processing 25.5, 2017
- Adrian Herzog and Emanuël A.P. Habets “Direction and Reverberation Preserving Noise Reduction of Ambisonics Signals” In IEEE/ACM Trans. Audio, Speech, and Language Processing 28, 2020
- “Independent Vector Analysis with More Microphones Than Sources” In Proc. IEEE WASPAA, 2019
- Rintaro Ikeshita, Tomohiro Nakatani and Shoko Araki “Block Coordinate Descent Algorithms for Auxiliary-function-based Independent Vector Extraction” In IEEE Trans. Signal Processing 69, 2021, pp. 3252–3267
- “Low Latency Online Source Separation and Noise Reduction Based on Joint Optimization with Dereverberation” In Proc. European Signal Processing Conference (EUSIPCO), 2021, pp. 1000–1004 DOI: 10.23919/EUSIPCO54536.2021.9616119
- Ann Spriet, Marc Moonen and Jan Wouters “Robustness Analysis of Multichannel Wiener Filtering and Generalized Sidelobe Cancellation for Multimicrophone Noise Reduction in Hearing Aid Applications” In IEEE Transactions on Speech and Audio Processing 13.4, 2005, pp. 487–503
- “ITU-R recommendation BS.1534”
- “Frequency-domain criterion for the speech distortion weighted multichannel Wiener filter for robust noise reduction” In Speech Communication 49, 2007, pp. 636–656
- Alexander Krueger, Ernst Warsits and Reinhold Haeb-Umbach “Speech Enhancement with a GSC-Like Structure Employing Eigenvector-Based Transfer Function Ratio Estimation” In IEEE Trans. Audio, Speech, and Language Processing 19.1, 2011, pp. 206–219
- “Simultaneous Denoising and Dereverberation for Low-Latency Applications Using Frame-by-Frame Online Unified Convolutional Beamformer” In Proc. Interspeech 2019, 2019, pp. 111–115 DOI: 10.21437/Interspeech.2019-1286
- “Blind Acoustic Beamforming Based on Generalized Eigenvalue Decomposition” In IEEE Transactions on Audio, Speech, and Language Processing 15.5, 2007
- S. Araki, H. Sawada and S. Makino “BLIND SPEECH SEPARATION IN A MEETING SITUATION WITH MAXIMUM SNR BEAMFORMER” In Proc. IEEE ICASSP, 2007, pp. 41–44
- J. Heymann, L. Drude and R. Haeb-Umbach “Neural network based spectral mask estimation for acoustic beamforming” In Proc. IEEE ICASSP, 2016, pp. 196–200
- “Blind separation of speech mixtures via time-frequency masking” In IEEE Transactions on Signal Processing 52.7, 2004, pp. 1830–1847
- D. Wang “On ideal binary mask as the computational goal of auditory scene analysis” In Speech Separation by Humans and Machines, 2005, pp. 181–197
- “A multichannel MMSE-based framework for speech source separation and noise reduction” In IEEE Trans. Audio, Speech, and Language Processing 21.9, 2013, pp. 1913–1928
- “ATR Japanese speech database as a tool of speech recognition and synthesis” In Speech communication 9.4, 1990, pp. 357–363
- “Noise power spectral density tracking: A maximum likelihood perspective” In IEEE Signal Processing Letters 19.8, 2012, pp. 495–498
- John H.L. Hansen and Bryan L. Pellom “An Effective Quality Evaluation Protocol For Speech Enhancement Algorithms” In Proc. International conference on spoken language processing, 1998, pp. 2819–2822
- “Evaluation of objective quality measures for speech enhancement” In IEEE Trans. Audio, Speech, and Language Processing 16.1, 2008, pp. 229–238