Picture for Stavros Petridis

Stavros Petridis

Contextual Speech Extraction: Leveraging Textual History as an Implicit Cue for Target Speech Extraction

Add code
Mar 11, 2025
Viaarxiv icon

Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs

Add code
Mar 09, 2025
Viaarxiv icon

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations

Add code
Mar 08, 2025
Viaarxiv icon

KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation

Add code
Mar 03, 2025
Viaarxiv icon

Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs

Add code
Nov 04, 2024
Figure 1 for Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
Figure 2 for Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
Figure 3 for Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
Figure 4 for Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
Viaarxiv icon

Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models

Add code
Oct 10, 2024
Figure 1 for Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models
Figure 2 for Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models
Figure 3 for Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models
Figure 4 for Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models
Viaarxiv icon

RT-LA-VocE: Real-Time Low-SNR Audio-Visual Speech Enhancement

Add code
Jul 10, 2024
Viaarxiv icon

Dynamic Data Pruning for Automatic Speech Recognition

Add code
Jun 26, 2024
Viaarxiv icon

MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization

Add code
Jun 25, 2024
Figure 1 for MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization
Figure 2 for MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization
Figure 3 for MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization
Figure 4 for MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization
Viaarxiv icon

EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars

Add code
Apr 29, 2024
Viaarxiv icon