Visual Speech Recognition


asr_eval: Algorithms and tools for multi-reference and streaming speech recognition evaluation

Add code
Jan 28, 2026
Viaarxiv icon

MA-LipNet: Multi-Dimensional Attention Networks for Robust Lipreading

Add code
Jan 27, 2026
Viaarxiv icon

OCR-Enhanced Multimodal ASR Can Read While Listening

Add code
Jan 26, 2026
Viaarxiv icon

Noise-Robust AV-ASR Using Visual Features Both in the Whisper Encoder and Decoder

Add code
Jan 26, 2026
Viaarxiv icon

Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition

Add code
Jan 18, 2026
Viaarxiv icon

HoverAI: An Embodied Aerial Agent for Natural Human-Drone Interaction

Add code
Jan 20, 2026
Viaarxiv icon

AI-based System for Transforming text and sound to Educational Videos

Add code
Jan 16, 2026
Viaarxiv icon

Robust CAPTCHA Using Audio Illusions in the Era of Large Language Models: from Evaluation to Advances

Add code
Jan 13, 2026
Viaarxiv icon

MCGA: A Multi-task Classical Chinese Literary Genre Audio Corpus

Add code
Jan 14, 2026
Viaarxiv icon

VALLR-Pin: Uncertainty-Factorized Visual Speech Recognition for Mandarin with Pinyin Guidance

Add code
Dec 29, 2025
Viaarxiv icon