Detection of Glottal Closure Instants from Speech Signals: a Quantitative Review (2001.00473v1)

Published 28 Dec 2019 in cs.SD, cs.CL, and eess.AS

Abstract: The pseudo-periodicity of voiced speech can be exploited in several speech processing applications. This requires however that the precise locations of the Glottal Closure Instants (GCIs) are available. The focus of this paper is the evaluation of automatic methods for the detection of GCIs directly from the speech waveform. Five state-of-the-art GCI detection algorithms are compared using six different databases with contemporaneous electroglottographic recordings as ground truth, and containing many hours of speech by multiple speakers. The five techniques compared are the Hilbert Envelope-based detection (HE), the Zero Frequency Resonator-based method (ZFR), the Dynamic Programming Phase Slope Algorithm (DYPSA), the Speech Event Detection using the Residual Excitation And a Mean-based Signal (SEDREAMS) and the Yet Another GCI Algorithm (YAGA). The efficacy of these methods is first evaluated on clean speech, both in terms of reliabililty and accuracy. Their robustness to additive noise and to reverberation is also assessed. A further contribution of the paper is the evaluation of their performance on a concrete application of speech processing: the causal-anticausal decomposition of speech. It is shown that for clean speech, SEDREAMS and YAGA are the best performing techniques, both in terms of identification rate and accuracy. ZFR and SEDREAMS also show a superior robustness to additive noise and reverberation.

Citations (240)

View on Semantic Scholar

Summary

The paper quantitatively compares five GCI detection algorithms using multiple speech databases and reliable EGG references.
It demonstrates that SEDREAMS and YAGA achieve high detection rates (>98%) with precise timing, while also discussing performance variations under noise and reverberation.
The study emphasizes computational efficiency, highlighting Fast SEDREAMS as a resource-effective solution for real-time speech processing applications.

Detection of Glottal Closure Instants from Speech Signals: A Critical Review

The paper "Detection of Glottal Closure Instants from Speech Signals: a Quantitative Review" provides a comprehensive quantitative assessment of various state-of-the-art methods aimed at detecting Glottal Closure Instants (GCIs) from speech waveforms. The precision in identifying GCIs, which indicate points of glottal closure during voiced speech production, holds high relevance for applications in speech processing like synthesis, modification, and de-reverberation. This examination covers five prominent algorithms, namely: Hilbert Envelope-based detection (HE), Zero Frequency Resonator-based method (ZFR), Dynamic Programming Phase Slope Algorithm (DYPSA), Speech Event Detection using Residual Excitation And a Mean-based Signal (SEDREAMS), and Yet Another GCI Algorithm (YAGA). Performance measures consider both clean condition performance and robustness against noise and reverberation.

Methodology and Evaluation

The paper utilizes six diverse speech databases containing several hours of recordings. The reference GCIs are taken from contemporaneous electroglottographic (EGG) recordings, ensuring a reliable ground truth. The algorithms' performance on clean speech is evaluated through identification rate, miss rate, false alarm rate, and accuracy, with strong emphasis on timing precision.

Key Results for Clean Speech:

SEDREAMS and YAGA consistently exhibit higher identification rates exceeding 98%. Their accuracy remains superior with more than 80% of detected GCIs falling within an error margin of 0.25 ms.
ZFR, while demonstrating comparable performance on certain datasets, shows variability in accuracy across different speakers.
DYPSA and HE trail behind, with DYPSA showing susceptibility to certain data variances.

Noise and Reverberation Robustness

The paper also investigates the impact of additive noise (both white and babble) and reverberation on these techniques. Empirical results underscore substantial performance degradation, especially for LP-based approaches like DYPSA and YAGA, under reverberation conditions.

Additional Observations:

ZFR and SEDREAMS exhibit robust noise resistance, maintaining accuracy and reliability across a range of signal-to-noise ratios.
The ZFR method shows slight superiority in handling reverberation, indicating its potential effectiveness in acoustic environments typically adverse to waveform-based detection methods.

Computational Complexity

In addition to accuracy and robustness, computational efficiency is crucial for practical applications. Here, SEDREAMS emerges as notably efficient, especially in its accelerated form. By optimizing the computation of mean-based signals, Fast SEDREAMS becomes the least resource-intensive among evaluated techniques, while still retaining performance excellence.

Implications and Future Directions

The clearer understanding of GCIs' role and accurate retrieval promise advancements in speech technologies, including more natural-sounding synthesis and enhanced speech recognition in challenging environments. The paper does not make explicit bold claims, but these comparative insights can inform the selection and development of GCI detection algorithms suited for specific applications and environments. Future research could explore continuous integration of machine learning methods to further enhance robustness and adaptivity of GCI detection methods.

In summary, this paper serves as a crucial resource for researchers exploring glottal-synchronous speech processing, providing both evaluative benchmarks and evidence-based recommendations tailored to varied application needs.

PDF Markdown