Speaker Recognition


Speaker recognition is the process of identifying and verifying individuals based on their voice characteristics.

Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides

Add code
Apr 21, 2025
Viaarxiv icon

Acoustic to Articulatory Inversion of Speech; Data Driven Approaches, Challenges, Applications, and Future Scope

Add code
Apr 17, 2025
Viaarxiv icon

Real-Time Word-Level Temporal Segmentation in Streaming Speech Recognition

Add code
Apr 15, 2025
Viaarxiv icon

Visual-Aware Speech Recognition for Noisy Scenarios

Add code
Apr 09, 2025
Viaarxiv icon

F5R-TTS: Improving Flow Matching based Text-to-Speech with Group Relative Policy Optimization

Add code
Apr 03, 2025
Viaarxiv icon

LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect

Add code
Apr 03, 2025
Viaarxiv icon

BeMERC: Behavior-Aware MLLM-based Framework for Multimodal Emotion Recognition in Conversation

Add code
Mar 31, 2025
Viaarxiv icon

VALLR: Visual ASR Language Model for Lip Reading

Add code
Mar 27, 2025
Viaarxiv icon

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors

Add code
Mar 20, 2025
Viaarxiv icon

GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations

Add code
Mar 26, 2025
Viaarxiv icon