Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Predicting tongue motion in unlabeled ultrasound videos using convolutional LSTM neural network (1902.06927v1)

Published 19 Feb 2019 in cs.CV, cs.LG, and cs.MM

Abstract: A challenge in speech production research is to predict future tongue movements based on a short period of past tongue movements. This study tackles speaker-dependent tongue motion prediction problem in unlabeled ultrasound videos with convolutional long short-term memory (ConvLSTM) networks. The model has been tested on two different ultrasound corpora. ConvLSTM outperforms 3-dimensional convolutional neural network (3DCNN) in predicting the 9\textsuperscript{th} frames based on 8 preceding frames, and also demonstrates good capacity to predict only the tongue contours in future frames. Further tests reveal that ConvLSTM can also learn to predict tongue movements in more distant frames beyond the immediately following frames. Our codes are available at: https://github.com/shuiliwanwu/ConvLstm-ultrasound-videos.

Citations (23)

Summary

We haven't generated a summary for this paper yet.