Visual Speech Recognition


Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides

Add code
Apr 21, 2025
Viaarxiv icon

Visual-Aware Speech Recognition for Noisy Scenarios

Add code
Apr 09, 2025
Viaarxiv icon

A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations

Add code
Apr 04, 2025
Viaarxiv icon

VALLR: Visual ASR Language Model for Lip Reading

Add code
Mar 27, 2025
Viaarxiv icon

SocialGesture: Delving into Multi-person Gesture Understanding

Add code
Apr 03, 2025
Viaarxiv icon

Deep Learning for Speech Emotion Recognition: A CNN Approach Utilizing Mel Spectrograms

Add code
Mar 25, 2025
Viaarxiv icon

MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens

Add code
Mar 14, 2025
Viaarxiv icon

Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering

Add code
Mar 27, 2025
Viaarxiv icon

Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs

Add code
Mar 09, 2025
Viaarxiv icon

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations

Add code
Mar 08, 2025
Viaarxiv icon