Precise Detection of Speech Endpoints Dynamically: A Wavelet Convolution based approach (1804.06159v1)

Published 17 Apr 2018 in eess.AS

Abstract: Precise detection of speech endpoints is an important factor which affects the performance of the systems where speech utterances need to be extracted from the speech signal such as Automatic Speech Recognition (ASR) system. Existing endpoint detection (EPD) methods mostly uses Short-Term Energy (STE), Zero-Crossing Rate (ZCR) based approaches and their variants. But STE and ZCR based EPD algorithms often fail in the presence of Non-speech Sound Artifacts (NSAs) produced by the speakers. Algorithms based on pattern recognition and classification techniques are also proposed but require labeled data for training. A new algorithm termed as Wavelet Convolution based Speech Endpoint Detection (WCSEPD) is proposed in this article to extract speech endpoints. WCSEPD decomposes the speech signal into high-frequency and low-frequency components using wavelet convolution and computes entropy based thresholds for the two frequency components. The low-frequency thresholds are used to extract voiced speech segments, whereas the high-frequency thresholds are used to extract the unvoiced speech segments by filtering out the NSAs. WCSEPD does not require any labeled data for training and can automatically extract speech segments. Experiment results show that the proposed algorithm precisely extracts speech endpoints in the presence of NSAs.

Citations (16)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Precise Detection of Speech Endpoints Dynamically: A Wavelet Convolution based approach (1804.06159v1)

Summary

Related Papers