IANS: Intelligibility-aware Null-steering Beamforming for Dual-Microphone Arrays (2307.04179v1)
Abstract: Beamforming techniques are popular in speech-related applications due to their effective spatial filtering capabilities. Nonetheless, conventional beamforming techniques generally depend heavily on either the target's direction-of-arrival (DOA), relative transfer function (RTF) or covariance matrix. This paper presents a new approach, the intelligibility-aware null-steering (IANS) beamforming framework, which uses the STOI-Net intelligibility prediction model to improve speech intelligibility without prior knowledge of the speech signal parameters mentioned earlier. The IANS framework combines a null-steering beamformer (NSBF) to generate a set of beamformed outputs, and STOI-Net, to determine the optimal result. Experimental results indicate that IANS can produce intelligibility-enhanced signals using a small dual-microphone array. The results are comparable to those obtained by null-steering beamformers with given knowledge of DOAs.
- R. Chakraborty and C. Nadeu, “Joint recognition and direction-of-arrival estimation of simultaneous meeting-room acoustic events,” in Proc. INTERSPEECH, 2013, pp. 2948–2952.
- “Microphone array post-filter based on spatially-correlated noise measurements for distant speech recognition,” in Proc. INTERSPEECH, 2012, pp. 298–301.
- “Unified architecture for multichannel end-to-end speech recognition with neural beamforming,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1274–1288, 2017.
- “Robust adaptive beamforming,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 35, no. 10, pp. 1365–1376, 1987.
- “Study of the null directions on the performance of differential beamformers,” in Proc. ICASSP, 2022, pp. 4928–4932.
- Acoustic array systems: theory, implementation, and application, John Wiley & Sons, 2013.
- J. Capon, “High-resolution frequency-wavenumber spectrum analysis,” Proceedings of the IEEE, vol. 57, no. 8, pp. 1408–1418, 1969.
- “Acoustic feedback cancellation for a multi-microphone earpiece based on a null-steering beamformer,” in Proc. IWAENC, 2016, pp. 1–5.
- “A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation,” in Proc. ICASSP, 2002, pp. 881–884.
- “Beamforming based on null-steering with small spacing linear microphone arrays,” The Journal of the Acoustical Society of America, vol. 143, no. 5, pp. 2651–2665, 2018.
- R. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Transactions on Antennas and Propagation, vol. 34, no. 3, pp. 276–280, 1986.
- R. Roy and T. Kailath, “ESPRIT-estimation of signal parameters via rotational invariance techniques,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 7, pp. 984–995, 1989.
- “Robust DOA estimation of multiple speech sources,” in Proc. ICASSP, 2014, pp. 2287–2291.
- J. H. DiBiase, A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays, Ph.D. thesis, Brown University, Providence, R.I., 2000.
- M. Omologo and P. Svaizer, “Use of the crosspower-spectrum phase in acoustic event location,” IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3, pp. 288–292, 1997.
- C. Knapp and G. C. Carter, “The generalized correlation method for estimation of time delay,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, no. 4, pp. 320–327, 1976.
- S. Markovich-Golan and S. Gannot, “Performance analysis of the covariance subtraction method for relative transfer function estimation and comparison to the covariance whitening method,” in Proc. ICASSP, 2015, pp. 544–548.
- “Integrating neural network based beamforming and weighted prediction error dereverberation.,” in Proc. INTERSPEECH, 2018, pp. 3043–3047.
- “Neural network based spectral mask estimation for acoustic beamforming,” in Proc. ICASSP, 2016, pp. 196–200.
- “Deep complex-valued neural beamformers,” in Proc. ICASSP, 2019, pp. 2902–2906.
- “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125–2136, 2011.
- A. N. S. Institute S3.5-1997, “Methods for calculation of the speech intelligibility index,” American National Standards Institute (ANSI), 1997.
- “STOI-Net: A deep learning based non-intrusive speech intelligibility assessment model,” in Proc. APSIPA, 2020, pp. 482–486.
- “Beamforming for a source located in the interior of a sensor array,” in Proc. ISSPA, 1999, vol. 2, pp. 873–876.
- Y. Luo and N. Mesgarani, “Conv-TasNet: Surpassing ideal time–frequency magnitude masking for speech separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 8, pp. 1256–1266, 2019.
- Y. Liu and D. Wang, “Divide and conquer: A deep CASA approach to talker-independent monaural speaker separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 12, pp. 2092–2102, 2019.
- “DPCRN: Dual-path convolution recurrent network for single channel speech enhancement,” arXiv preprint arXiv:2107.05429, 2021.
- “MetricGAN+: An improved version of MetricGAN for speech enhancement,” arXiv preprint arXiv:2104.03538, 2021.
- “ESPnet-SE++: Speech enhancement for robust speech recognition, translation, and understanding,” arXiv preprint arXiv:2207.09514, 2022.
- O. L. Frost, “An algorithm for linearly constrained adaptive array processing,” Proceedings of the IEEE, vol. 60, no. 8, pp. 926–935, 1972.
- “New insights into the MVDR beamformer in room acoustics,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 1, pp. 158–170, 2009.
- “Quality-net: An end-to-end non-intrusive speech quality assessment model based on blstm,” arXiv preprint arXiv:1808.05344, 2018.
- W. Huang and J. Feng, “Differential beamforming for uniform circular array with directional microphones,” in Proc. INTERSPEECH, 2020, pp. 71–75.
- “Pyroomacoustics: A python package for audio room simulation and array processing algorithms,” in Proc. ICASSP, 2018, pp. 351–355.
- “Image method for efficiently simulating small-room acoustics,” The Journal of the Acoustical Society of America, vol. 65, no. 4, pp. 943–950, 1979.
- D. B. Paul and J. Baker, “The design for the Wall Street Journal-based CSR corpus,” in Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992, 1992, pp. 357–362.
- P. C. Loizou, Speech enhancement: theory and practice, CRC press, 2007.
- A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Communication, vol. 12, no. 3, pp. 247–251, 1993.
- M.-W. Huang, “Development of Taiwan Mandarin hearing in noise test,” M.S. thesis, Department of speech language pathology and audiology, National Taipei University of Nursing and Health science, 2005.
- “The diverse environments multi-channel acoustic noise database (DEMAND): A database of multichannel environmental noise recordings,” Proceedings of Meetings on Acoustics, vol. 19, no. 1, pp. 035081, 2013.
- “Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs,” in Proc. ICASSP, 2001, pp. 749–752.
- “PESQ (perceptual evaluation of speech quality) wrapper for python users,” May 2022.
- “Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 54–70, 2022.