Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment (2307.03296v2)
Abstract: Dysarthria is a disability that causes a disturbance in the human speech system and reduces the quality and intelligibility of a person's speech. Because of this effect, the normal speech processing systems can not work properly on impaired speech. This disability is usually associated with physical disabilities. Therefore, designing a system that can perform some tasks by receiving voice commands in the smart home can be a significant achievement. In this work, we introduce gammatonegram as an effective method to represent audio files with discriminative details, which is used as input for the convolutional neural network. On the other word, we convert each speech file into an image and propose image recognition system to classify speech in different scenarios. Proposed CNN is based on the transfer learning method on the pre-trained Alexnet. In this research, the efficiency of the proposed system for speech recognition, speaker identification, and intelligibility assessment is evaluated. According to the results on the UA dataset, the proposed speech recognition system achieved 91.29% accuracy in speaker-dependent mode, the speaker identification system acquired 87.74% accuracy in text-dependent mode, and the intelligibility assessment system achieved 96.47% accuracy in two-class mode. Finally, we propose a multi-network speech recognition system that works fully automatically. This system is located in a cascade arrangement with the two-class intelligibility assessment system, and the output of this system activates each one of the speech recognition networks. This architecture achieves an accuracy of 92.3% WRR. The source code of this paper is available.
- “Comparing humans and automatic speech recognition systems in recognizing dysarthric speech,” in Advances in Artificial Intelligence: 24th Canadian Conference on Artificial Intelligence, Canadian AI 2011, St. John’s, Canada, May 25-27, 2011. Proceedings 24. Springer, 2011, pp. 291–300.
- “A review of speaker diarization: Recent advances with deep learning,” Computer Speech & Language, vol. 72, pp. 101317, 2022.
- “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal processing magazine, vol. 29, no. 6, pp. 82–97, 2012.
- “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
- “Decentralizing feature extraction with quantum convolutional neural network for automatic speech recognition,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6523–6527.
- “Dysarthric speech database for universal access research,” in Ninth Annual Conference of the International Speech Communication Association, 2008.
- “Waste image classification based on transfer learning and convolutional neural network,” Waste Management, vol. 135, pp. 150–157, 2021.
- “A survey of technologies for automatic dysarthric speech recognition,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2023, no. 1, pp. 48, 2023.
- “Acoustic analysis of speech,” The handbook of clinical linguistics, pp. 360–380, 2008.
- “The htk book,” Cambridge university engineering department, vol. 3, no. 175, pp. 12, 2002.
- “Dysarthric speech recognition using convolutional lstm neural network.,” in INTERSPEECH, 2018, pp. 2948–2952.
- “On the use of pitch features for disordered speech recognition.,” in Interspeech, 2019, pp. 4130–4134.
- “Dysarthric speech recognition using time-delay neural network based denoising autoencoder.,” in INTERSPEECH, 2018, pp. 451–455.
- Seyed Reza Shahamiri, “Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 29, pp. 852–861, 2021.
- “Improved end-to-end dysarthric speech recognition via meta-learning based model re-initialization,” in 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE, 2021, pp. 1–5.
- “Recent progress in the cuhk dysarthric speech recognition system,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2267–2281, 2021.
- “Raw source and filter modelling for dysarthric speech recognition,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7377–7381.
- “End-to-end dysarthric speech recognition using multiple databases,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 6395–6399.
- “Dysarthric speech transformer: A sequence-to-sequence dysarthric speech recognition system,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2023.
- “E2e-dasr: End-to-end deep learning-based dysarthric automatic speech recognition,” Expert Systems with Applications, vol. 222, pp. 119797, 2023.
- “Multi-stage audio-visual fusion for dysarthric speech recognition with pre-trained models,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 1912–1921, 2023.
- “Transfer learning using whisper for dysarthric automatic speech recognition,” in International Conference on Speech and Computer. Springer, 2023, pp. 579–589.
- “Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks,” Etri Journal, vol. 40, no. 5, pp. 643–652, 2018.
- “Fully automated speaker identification and intelligibility assessment in dysarthria disease using auditory knowledge,” Biocybernetics and Biomedical Engineering, vol. 36, no. 1, pp. 233–247, 2016.
- “Constant q cepstral coefficients for automatic speaker verification system for dysarthria patients,” Circuits, Systems, and Signal Processing, pp. 1–18, 2023.
- “Automatic speaker verification system for dysarthria patients.,” in INTERSPEECH, 2022, pp. 5070–5074.
- “Residual neural network precisely quantifies dysarthria severity-level based on short-duration speech segments,” Neural Networks, vol. 139, pp. 105–117, 2021.
- “Classification of dysarthric speech according to the severity of impairment: an analysis of acoustic features,” IEEE Access, vol. 9, pp. 18183–18194, 2021.
- “Automated dysarthria severity classification: A study on acoustic features and deep learning techniques,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 30, pp. 1147–1157, 2022.
- “An investigation to identify optimal setup for automated assessment of dysarthric intelligibility using deep learning technologies,” Cognitive Computation, vol. 15, no. 1, pp. 146–158, 2023.
- “A few-shot approach to dysarthric speech intelligibility level classification using transformers,” arXiv e-prints, pp. arXiv–2309, 2023.
- “Speech intelligibility classifiers from 550k disordered speech samples,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
- “Gammatonegram based speaker identification,” in 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE). IEEE, 2014, pp. 52–55.
- Theory and applications of digital speech processing, Prentice Hall Press, 2010.
- “Semi-supervised speech activity detection with an application to automatic speaker verification,” Computer Speech & Language, vol. 47, pp. 132–156, 2018.
- “Nevisa, a persian continuous speech recognition system,” in Advances in Computer Science and Engineering: 13th International CSI Computer Conference, CSICC 2008 Kish Island, Iran, March 9-11, 2008 Revised Selected Papers. Springer, 2009, pp. 485–492.
- “An overview of speech recognition using hmm,” International Journal of Computer Science and Mobile Computing, vol. 2, no. 6, pp. 233–238, 2013.
- Kevin Murphy, “Hidden markov model (hmm) toolbox for matlab,” ”https://www.cs.ubc.ca/ murphyk/Software/HMM/hmm.html”, 1998.
- Seyed Reza Shahamiri and Siti Salwah Binti Salim, “Real-time frequency-based noise-robust automatic speech recognition using multi-nets artificial neural networks: A multi-views multi-learners approach,” Neurocomputing, vol. 129, pp. 199–207, 2014.
- Aref Farhadipour (7 papers)
- Hadi Veisi (18 papers)