Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention (2012.14360v1)

Published 28 Dec 2020 in cs.CV and eess.IV

Abstract: In this paper, we propose a novel deep learning architecture to improving word-level lip-reading. On the one hand, we first introduce the multi-scale processing into the spatial feature extraction for lip-reading. Specially, we proposed hierarchical pyramidal convolution (HPConv) to replace the standard convolution in original module, leading to improvements over the model's ability to discover fine-grained lip movements. On the other hand, we merge information in all time steps of the sequence by utilizing self-attention, to make the model pay more attention to the relevant frames. These two advantages are combined together to further enhance the model's classification power. Experiments on the Lip Reading in the Wild (LRW) dataset show that our proposed model has achieved 86.83% accuracy, yielding 1.53% absolute improvement over the current state-of-the-art. We also conducted extensive experiments to better understand the behavior of the proposed model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hang Chen (77 papers)
  2. Jun Du (130 papers)
  3. Yu Hu (75 papers)
  4. Li-Rong Dai (26 papers)
  5. Chin-Hui Lee (52 papers)
  6. Bao-Cai Yin (2 papers)
Citations (5)