Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fine-tuning convergence model in Bengali speech recognition (2311.04122v1)

Published 7 Nov 2023 in eess.AS

Abstract: Research on speech recognition has attracted considerable interest due to the difficult task of segmenting uninterrupted speech. Among various languages, Bengali features distinct rhythmic patterns and tones, making it particularly difficult to recognize and lacking an efficient commercial recognition method. In order to improve the automatic speech recognition model for Bengali, our team has chosen to utilize the wave2vec 2.0 pre-trained model, which has undergone convergence for fine-tuning. Regarding Word Error Rate (WER), the learning rate and dropout parameters were fine-tuned, and after the model training was stable, attempts were made to enlarge the training set ratio, which improved the model's performance. Consequently, there was a notable enhancement in the WER from 0.508 to 0.437 on the test set of the publicly listed official dataset. Afterwards, the training and validation sets were merged, creating a comprehensive dataset that was used as the training set, achieving a remarkable WER of 0.436.

Summary

We haven't generated a summary for this paper yet.