Picture for Viet Anh Trinh

Viet Anh Trinh

CUNY Graduate Center

Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?

Add code
Sep 13, 2024
Viaarxiv icon

Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing

Add code
Jun 04, 2024
Viaarxiv icon

Two-pass Endpoint Detection for Speech Recognition

Add code
Jan 17, 2024
Viaarxiv icon

Adaptive Endpointing with Deep Contextual Multi-armed Bandits

Add code
Mar 23, 2023
Viaarxiv icon

Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation

Add code
Jul 16, 2022
Figure 1 for Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation
Figure 2 for Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation
Viaarxiv icon

ImportantAug: a data augmentation agent for speech

Add code
Dec 14, 2021
Figure 1 for ImportantAug: a data augmentation agent for speech
Figure 2 for ImportantAug: a data augmentation agent for speech
Figure 3 for ImportantAug: a data augmentation agent for speech
Figure 4 for ImportantAug: a data augmentation agent for speech
Viaarxiv icon

Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

Add code
Nov 16, 2021
Figure 1 for Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses
Figure 2 for Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses
Figure 3 for Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses
Figure 4 for Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses
Viaarxiv icon

Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech Enhancement

Add code
Dec 02, 2020
Figure 1 for Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech Enhancement
Figure 2 for Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech Enhancement
Figure 3 for Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech Enhancement
Figure 4 for Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech Enhancement
Viaarxiv icon

Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks

Add code
Dec 02, 2020
Figure 1 for Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks
Figure 2 for Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks
Figure 3 for Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks
Figure 4 for Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks
Viaarxiv icon

Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks

Add code
Dec 02, 2020
Figure 1 for Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks
Figure 2 for Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks
Figure 3 for Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks
Figure 4 for Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks
Viaarxiv icon