Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Multi-Channel Speech Enhancement with Spherical Harmonics Injection for Directional Encoding (2309.10832v1)

Published 19 Sep 2023 in cs.SD and eess.AS

Abstract: Multi-channel speech enhancement extracts speech using multiple microphones that capture spatial cues. Effectively utilizing directional information is key for multi-channel enhancement. Deep learning shows great potential on multi-channel speech enhancement and often takes short-time Fourier Transform (STFT) as inputs directly. To fully leverage the spatial information, we introduce a method using spherical harmonics transform (SHT) coefficients as auxiliary model inputs. These coefficients concisely represent spatial distributions. Specifically, our model has two encoders, one for the STFT and another for the SHT. By fusing both encoders in the decoder to estimate the enhanced STFT, we effectively incorporate spatial context. Evaluations on TIMIT under varying noise and reverberation show our model outperforms established benchmarks. Remarkably, this is achieved with fewer computations and parameters. By leveraging spherical harmonics to incorporate directional cues, our model efficiently improves the performance of the multi-channel speech enhancement.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. “Realistic audio in immersive video conferencing,” in 2011 IEEE International Conference on Multimedia and Expo. IEEE, 2011, pp. 1–4.
  2. “Conferencingspeech challenge: Towards far-field multi-channel speech enhancement for video conferencing,” in 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2021, pp. 679–686.
  3. “Teleimmersive audio-visual communication using commodity hardware [applications corner],” IEEE Signal Processing Magazine, vol. 31, no. 6, pp. 118–136, 2014.
  4. “Real-time speech enhancement using an efficient convolutional recurrent network for dual-microphone mobile phones in close-talk scenarios,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 5751–5755.
  5. “Acoustic beamforming for hearing aid applications,” Handbook on array processing and sensor networks, pp. 269–302, 2010.
  6. “Enhanced smart hearing aid using deep neural networks,” Alexandria Engineering Journal, vol. 58, no. 2, pp. 539–550, 2019.
  7. “Improved delay-and-sum beamforming algorithm for breast cancer detection,” International Journal of Antennas and Propagation, vol. 2008, 2008.
  8. “On optimal frequency-domain multichannel linear filtering for noise reduction,” IEEE Transactions on audio, speech, and language processing, vol. 18, no. 2, pp. 260–276, 2009.
  9. “Superdirective microphone arrays,” in Microphone arrays, pp. 19–38. Springer, 2001.
  10. Ke Tan and DeLiang Wang, “Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 380–390, 2019.
  11. “Dpcrn: Dual-path convolution recurrent network for single channel speech enhancement,” arXiv preprint arXiv:2107.05429, 2021.
  12. “Inplace gated convolutional recurrent neural network for dual-channel speech enhancement,” arXiv preprint arXiv:2107.11968, 2021.
  13. “Neural spectrospatial filtering,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 605–621, 2022.
  14. “Fasnet: Low-latency adaptive beamforming for multi-microphone audio processing,” in 2019 IEEE automatic speech recognition and understanding workshop (ASRU). IEEE, 2019, pp. 260–267.
  15. “Embedding and beamforming: All-neural causal beamformer for multichannel speech enhancement,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 6487–6491.
  16. “A causal u-net based neural beamforming network for real-time multi-channel speech enhancement.,” in Interspeech, 2021, pp. 1832–1836.
  17. “Near-field acoustic source localization and beamforming in spherical harmonics domain,” IEEE Transactions on Signal Processing, vol. 64, no. 13, pp. 3351–3361, 2016.
  18. “A deep learning framework for robust doa estimation using spherical harmonic decomposition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1248–1259, 2020.
  19. “Speech enhancement using masking for binaural reproduction of ambisonics signals,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1767–1777, 2020.
  20. Boaz Rafaely, Fundamentals of spherical array processing, vol. 8, Springer, 2015.
  21. “Image method for efficiently simulating small-room acoustics,” The Journal of the Acoustical Society of America, vol. 60, no. S1, pp. S9, 1976.
  22. Jens Meyer, “Beamforming for a circular microphone array mounted on spherically shaped objects,” The Journal of the Acoustical Society of America, vol. 109, no. 1, pp. 185–193, 2001.
  23. “Speech database development at mit: Timit and beyond,” Speech communication, vol. 9, no. 4, pp. 351–356, 1990.
  24. “Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech communication, vol. 12, no. 3, pp. 247–251, 1993.
  25. “The third ‘chime’speech separation and recognition challenge: Dataset, task and baselines,” in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, 2015, pp. 504–511.
  26. “Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs,” in 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 01CH37221). IEEE, 2001, vol. 2, pp. 749–752.
  27. “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125–2136, 2011.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jiahui Pan (7 papers)
  2. Pengjie Shen (4 papers)
  3. Hui Zhang (405 papers)
  4. Xueliang Zhang (39 papers)