Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lip Reading Using Convolutional Auto Encoders as Feature Extractor (1805.12371v1)

Published 31 May 2018 in cs.CV

Abstract: Visual recognition of speech using the lip movement is called Lip-reading. Recent developments in this nascent field uses different neural networks as feature extractors which serve as input to a model which can map the temporal relationship and classify. Though end to end sentence level Lip-reading is the current trend, we proposed a new model which employs word level classification and breaks the set benchmarks for standard datasets. In our model we use convolutional autoencoders as feature extractors which are then fed to a Long short-term memory model. We tested our proposed model on BBC's LRW dataset, MIRACL-VC1 and GRID dataset. Achieving a classification accuracy of 98% on MIRACL-VC1 as compared to 93.4% of the set benchmark (Rekik et al., 2014). On BBC's LRW the proposed model performed better than the baseline model of convolutional neural networks and Long short-term memory model (Garg et al., 2016). Showing the features learned by the models we clearly indicate how the proposed model works better than the baseline model. The same model can also be extended for end to end sentence level classification.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Dharin Parekh (1 paper)
  2. Ankitesh Gupta (2 papers)
  3. Shharrnam Chhatpar (1 paper)
  4. Anmol Yash Kumar (1 paper)
  5. Manasi Kulkarni (2 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.