Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model (2011.04292v1)

Published 9 Nov 2020 in cs.SD, cs.LG, and eess.AS

Abstract: The calculation of most objective speech intelligibility assessment metrics requires clean speech as a reference. Such a requirement may limit the applicability of these metrics in real-world scenarios. To overcome this limitation, we propose a deep learning-based non-intrusive speech intelligibility assessment model, namely STOI-Net. The input and output of STOI-Net are speech spectral features and predicted STOI scores, respectively. The model is formed by the combination of a convolutional neural network and bidirectional long short-term memory (CNN-BLSTM) architecture with a multiplicative attention mechanism. Experimental results show that the STOI score estimated by STOI-Net has a good correlation with the actual STOI score when tested with noisy and enhanced speech utterances. The correlation values are 0.97 and 0.83, respectively, for the seen test condition (the test speakers and noise types are involved in the training set) and the unseen test condition (the test speakers and noise types are not involved in the training set). The results confirm the capability of STOI-Net to accurately predict the STOI scores without referring to clean speech.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ryandhimas E. Zezario (16 papers)
  2. Szu-Wei Fu (46 papers)
  3. Chiou-Shann Fuh (11 papers)
  4. Yu Tsao (200 papers)
  5. Hsin-Min Wang (97 papers)
Citations (38)

Summary

We haven't generated a summary for this paper yet.