Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition

Oct 21, 2024

Zehua Liu, Xiaolou Li, Chen Chen, Li Guo, Lantian Li, Dong Wang

Figure 1 for AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition

Figure 2 for AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition

Figure 3 for AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition

Figure 4 for AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition

Share this with someone who'll enjoy it:

Abstract:Visual Speech Recognition (VSR) aims to recognize corresponding text by analyzing visual information from lip movements. Due to the high variability and weak information of lip movements, VSR tasks require effectively utilizing any information from any source and at any level. In this paper, we propose a VSR method based on audio-visual cross-modal alignment, named AlignVSR. The method leverages the audio modality as an auxiliary information source and utilizes the global and local correspondence between the audio and visual modalities to improve visual-to-text inference. Specifically, the method first captures global alignment between video and audio through a cross-modal attention mechanism from video frames to a bank of audio units. Then, based on the temporal correspondence between audio and video, a frame-level local alignment loss is introduced to refine the global alignment, improving the utility of the audio information. Experimental results on the LRS2 and CNVSRC.Single datasets consistently show that AlignVSR outperforms several mainstream VSR methods, demonstrating its superior and robust performance.

View paper on

Share this with someone who'll enjoy it:

Title:AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition

Paper and Code