Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Resolution limits on visual speech recognition (1710.01073v1)

Published 3 Oct 2017 in cs.CV and eess.IV

Abstract: Visual-only speech recognition is dependent upon a number of factors that can be difficult to control, such as: lighting; identity; motion; emotion and expression. But some factors, such as video resolution are controllable, so it is surprising that there is not yet a systematic study of the effect of resolution on lip-reading. Here we use a new data set, the Rosetta Raven data, to train and test recognizers so we can measure the affect of video resolution on recognition accuracy. We conclude that, contrary to common practice, resolution need not be that great for automatic lip-reading. However it is highly unlikely that automatic lip-reading can work reliably when the distance between the bottom of the lower lip and the top of the upper lip is less than four pixels at rest.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Helen L. Bear (23 papers)
  2. Richard Harvey (9 papers)
  3. Barry-John Theobald (34 papers)
  4. Yuxuan Lan (3 papers)
Citations (20)

Summary

We haven't generated a summary for this paper yet.