Speaker Independent Continuous Speech to Text Converter for Mobile Application

Published 19 Jul 2013 in cs.CL, cs.NE, and cs.SD | (1307.5736v1)

Abstract: An efficient speech to text converter for mobile application is presented in this work. The prime motive is to formulate a system which would give optimum performance in terms of complexity, accuracy, delay and memory requirements for mobile environment. The speech to text converter consists of two stages namely front-end analysis and pattern recognition. The front end analysis involves preprocessing and feature extraction. The traditional voice activity detection algorithms which track only energy cannot successfully identify potential speech from input because the unwanted part of the speech also has some energy and appears to be speech. In the proposed system, VAD that calculates energy of high frequency part separately as zero crossing rate to differentiate noise from speech is used. Mel Frequency Cepstral Coefficient (MFCC) is used as feature extraction method and Generalized Regression Neural Network is used as recognizer. MFCC provides low word error rate and better feature extraction. Neural Network improves the accuracy. Thus a small database containing all possible syllable pronunciation of the user is sufficient to give recognition accuracy closer to 100%. Thus the proposed technique entertains realization of real time speaker independent applications like mobile phones, PDAs etc.