Picture for Samuel Thomas

Samuel Thomas

mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition

Add code
Feb 03, 2025
Viaarxiv icon

A Non-autoregressive Model for Joint STT and TTS

Add code
Jan 15, 2025
Viaarxiv icon

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Add code
Jun 14, 2024
Figure 1 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 2 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 3 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 4 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Viaarxiv icon

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

Add code
May 21, 2023
Viaarxiv icon

FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2

Add code
Apr 04, 2023
Viaarxiv icon

What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

Add code
Mar 29, 2023
Viaarxiv icon

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Add code
Oct 07, 2022
Figure 1 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 2 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 3 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 4 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Viaarxiv icon

Extending RNN-T-based speech recognition systems with emotion and language classification

Add code
Jul 28, 2022
Figure 1 for Extending RNN-T-based speech recognition systems with emotion and language classification
Figure 2 for Extending RNN-T-based speech recognition systems with emotion and language classification
Figure 3 for Extending RNN-T-based speech recognition systems with emotion and language classification
Figure 4 for Extending RNN-T-based speech recognition systems with emotion and language classification
Viaarxiv icon

Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems

Add code
Apr 11, 2022
Figure 1 for Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
Figure 2 for Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
Figure 3 for Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
Viaarxiv icon

Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding

Add code
Apr 11, 2022
Figure 1 for Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding
Figure 2 for Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding
Figure 3 for Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding
Figure 4 for Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding
Viaarxiv icon