Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

IANS: Intelligibility-aware Null-steering Beamforming for Dual-Microphone Arrays (2307.04179v1)

Published 9 Jul 2023 in eess.AS and eess.SP

Abstract: Beamforming techniques are popular in speech-related applications due to their effective spatial filtering capabilities. Nonetheless, conventional beamforming techniques generally depend heavily on either the target's direction-of-arrival (DOA), relative transfer function (RTF) or covariance matrix. This paper presents a new approach, the intelligibility-aware null-steering (IANS) beamforming framework, which uses the STOI-Net intelligibility prediction model to improve speech intelligibility without prior knowledge of the speech signal parameters mentioned earlier. The IANS framework combines a null-steering beamformer (NSBF) to generate a set of beamformed outputs, and STOI-Net, to determine the optimal result. Experimental results indicate that IANS can produce intelligibility-enhanced signals using a small dual-microphone array. The results are comparable to those obtained by null-steering beamformers with given knowledge of DOAs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. R. Chakraborty and C. Nadeu, “Joint recognition and direction-of-arrival estimation of simultaneous meeting-room acoustic events,” in Proc. INTERSPEECH, 2013, pp. 2948–2952.
  2. “Microphone array post-filter based on spatially-correlated noise measurements for distant speech recognition,” in Proc. INTERSPEECH, 2012, pp. 298–301.
  3. “Unified architecture for multichannel end-to-end speech recognition with neural beamforming,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1274–1288, 2017.
  4. “Robust adaptive beamforming,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 35, no. 10, pp. 1365–1376, 1987.
  5. “Study of the null directions on the performance of differential beamformers,” in Proc. ICASSP, 2022, pp. 4928–4932.
  6. Acoustic array systems: theory, implementation, and application, John Wiley & Sons, 2013.
  7. J. Capon, “High-resolution frequency-wavenumber spectrum analysis,” Proceedings of the IEEE, vol. 57, no. 8, pp. 1408–1418, 1969.
  8. “Acoustic feedback cancellation for a multi-microphone earpiece based on a null-steering beamformer,” in Proc. IWAENC, 2016, pp. 1–5.
  9. “A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation,” in Proc. ICASSP, 2002, pp. 881–884.
  10. “Beamforming based on null-steering with small spacing linear microphone arrays,” The Journal of the Acoustical Society of America, vol. 143, no. 5, pp. 2651–2665, 2018.
  11. R. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Transactions on Antennas and Propagation, vol. 34, no. 3, pp. 276–280, 1986.
  12. R. Roy and T. Kailath, “ESPRIT-estimation of signal parameters via rotational invariance techniques,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 7, pp. 984–995, 1989.
  13. “Robust DOA estimation of multiple speech sources,” in Proc. ICASSP, 2014, pp. 2287–2291.
  14. J. H. DiBiase, A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays, Ph.D. thesis, Brown University, Providence, R.I., 2000.
  15. M. Omologo and P. Svaizer, “Use of the crosspower-spectrum phase in acoustic event location,” IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3, pp. 288–292, 1997.
  16. C. Knapp and G. C. Carter, “The generalized correlation method for estimation of time delay,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, no. 4, pp. 320–327, 1976.
  17. S. Markovich-Golan and S. Gannot, “Performance analysis of the covariance subtraction method for relative transfer function estimation and comparison to the covariance whitening method,” in Proc. ICASSP, 2015, pp. 544–548.
  18. “Integrating neural network based beamforming and weighted prediction error dereverberation.,” in Proc. INTERSPEECH, 2018, pp. 3043–3047.
  19. “Neural network based spectral mask estimation for acoustic beamforming,” in Proc. ICASSP, 2016, pp. 196–200.
  20. “Deep complex-valued neural beamformers,” in Proc. ICASSP, 2019, pp. 2902–2906.
  21. “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125–2136, 2011.
  22. A. N. S. Institute S3.5-1997, “Methods for calculation of the speech intelligibility index,” American National Standards Institute (ANSI), 1997.
  23. “STOI-Net: A deep learning based non-intrusive speech intelligibility assessment model,” in Proc. APSIPA, 2020, pp. 482–486.
  24. “Beamforming for a source located in the interior of a sensor array,” in Proc. ISSPA, 1999, vol. 2, pp. 873–876.
  25. Y. Luo and N. Mesgarani, “Conv-TasNet: Surpassing ideal time–frequency magnitude masking for speech separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 8, pp. 1256–1266, 2019.
  26. Y. Liu and D. Wang, “Divide and conquer: A deep CASA approach to talker-independent monaural speaker separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 12, pp. 2092–2102, 2019.
  27. “DPCRN: Dual-path convolution recurrent network for single channel speech enhancement,” arXiv preprint arXiv:2107.05429, 2021.
  28. “MetricGAN+: An improved version of MetricGAN for speech enhancement,” arXiv preprint arXiv:2104.03538, 2021.
  29. “ESPnet-SE++: Speech enhancement for robust speech recognition, translation, and understanding,” arXiv preprint arXiv:2207.09514, 2022.
  30. O. L. Frost, “An algorithm for linearly constrained adaptive array processing,” Proceedings of the IEEE, vol. 60, no. 8, pp. 926–935, 1972.
  31. “New insights into the MVDR beamformer in room acoustics,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 1, pp. 158–170, 2009.
  32. “Quality-net: An end-to-end non-intrusive speech quality assessment model based on blstm,” arXiv preprint arXiv:1808.05344, 2018.
  33. W. Huang and J. Feng, “Differential beamforming for uniform circular array with directional microphones,” in Proc. INTERSPEECH, 2020, pp. 71–75.
  34. “Pyroomacoustics: A python package for audio room simulation and array processing algorithms,” in Proc. ICASSP, 2018, pp. 351–355.
  35. “Image method for efficiently simulating small-room acoustics,” The Journal of the Acoustical Society of America, vol. 65, no. 4, pp. 943–950, 1979.
  36. D. B. Paul and J. Baker, “The design for the Wall Street Journal-based CSR corpus,” in Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992, 1992, pp. 357–362.
  37. P. C. Loizou, Speech enhancement: theory and practice, CRC press, 2007.
  38. A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Communication, vol. 12, no. 3, pp. 247–251, 1993.
  39. M.-W. Huang, “Development of Taiwan Mandarin hearing in noise test,” M.S. thesis, Department of speech language pathology and audiology, National Taipei University of Nursing and Health science, 2005.
  40. “The diverse environments multi-channel acoustic noise database (DEMAND): A database of multichannel environmental noise recordings,” Proceedings of Meetings on Acoustics, vol. 19, no. 1, pp. 035081, 2013.
  41. “Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs,” in Proc. ICASSP, 2001, pp. 749–752.
  42. “PESQ (perceptual evaluation of speech quality) wrapper for python users,” May 2022.
  43. “Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 54–70, 2022.

Summary

We haven't generated a summary for this paper yet.