Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

Apr 08, 2024

He Wang, Pengcheng Guo, Xucheng Wan, Huan Zhou, Lei Xie

Figure 1 for Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

Figure 2 for Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

Figure 3 for Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

Figure 4 for Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

Share this with someone who'll enjoy it:

Abstract:Automatic lip-reading (ALR) aims to automatically transcribe spoken content from a speaker's silent lip motion captured in video. Current mainstream lip-reading approaches only use a single visual encoder to model input videos of a single scale. In this paper, we propose to enhance lipreading by incorporating multi-scale video data and multi-encoder. Specifically, we first propose a novel multi-scale lip extraction algorithm based on the size of the speaker's face and an enhanced ResNet3D visual front-end (VFE) to extract lip features at different scales. For the multi-encoder, in addition to the mainstream Transformer and Conformer, we also incorporate the recently proposed Branchformer and EBranchformer as visual encoders. In the experiments, we explore the influence of different video data scales and encoders on ALR system performance and fuse the texts transcribed by all ALR systems using recognizer output voting error reduction (ROVER). Finally, our proposed approach placed second in the ICME 2024 ChatCLR Challenge Task 2, with a 21.52% reduction in character error rate (CER) compared to the official baseline on the evaluation set.

* 6 pages, 3 figures, submitted to ICME2024 GC-ChatCLR

View paper on

Share this with someone who'll enjoy it:

Title:Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

Paper and Code