Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract (2008.02098v1)

Published 4 Aug 2020 in eess.AS and cs.SD

Abstract: Acoustic-to-articulatory inversion (AAI) methods estimate articulatory movements from the acoustic speech signal, which can be useful in several tasks such as speech recognition, synthesis, talking heads and language tutoring. Most earlier inversion studies are based on point-tracking articulatory techniques (e.g. EMA or XRMB). The advantage of rtMRI is that it provides dynamic information about the full midsagittal plane of the upper airway, with a high 'relative' spatial resolution. In this work, we estimated midsagittal rtMRI images of the vocal tract for speaker dependent AAI, using MGC-LSP spectral features as input. We applied FC-DNNs, CNNs and recurrent neural networks, and have shown that LSTMs are the most suitable for this task. As objective evaluation we measured normalized MSE, Structural Similarity Index (SSIM) and its complex wavelet version (CW-SSIM). The results indicate that the combination of FC-DNNs and LSTMs can achieve smooth generated MR images of the vocal tract, which are similar to the original MRI recordings (average CW-SSIM: 0.94).

Citations (6)

Summary

We haven't generated a summary for this paper yet.