Visual Speech Language Models (1809.06800v1)

Published 14 Sep 2018 in eess.AS and cs.SD

Abstract: LLMs (LM) are very powerful in lipreading systems. LLMs built upon the ground truth utterances of datasets learn grammar and structure rules of words and sentences (the latter in the case of continuous speech). However, visual co-articulation effects in visual speech signals damage the performance of visual speech LM's as visually, people do not utter what the LLM expects. These models are commonplace but while higher-order N-gram LM's may improve classification rates, the cost of this model is disproportionate to the common goal of developing more accurate classifiers. So we compare which unit would best optimize a lipreading (visual speech) LM to observe their limitations. We compare three units; visemes (visual speech units) \cite{lan2010improving}, phonemes (audible speech units), and words.

View on arXiv

Authors (1)

Helen L Bear (9 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Visual Speech Language Models (1809.06800v1)

Summary

Related Papers