Picture for Otavio Braga

Otavio Braga

Google Inc

On Robustness to Missing Video for Audiovisual Speech Recognition

Add code
Dec 19, 2023
Figure 1 for On Robustness to Missing Video for Audiovisual Speech Recognition
Figure 2 for On Robustness to Missing Video for Audiovisual Speech Recognition
Figure 3 for On Robustness to Missing Video for Audiovisual Speech Recognition
Figure 4 for On Robustness to Missing Video for Audiovisual Speech Recognition
Viaarxiv icon

Audio-visual fine-tuning of audio-only ASR models

Add code
Dec 14, 2023
Viaarxiv icon

End-to-End Multi-Person Audio/Visual Automatic Speech Recognition

Add code
May 11, 2022
Figure 1 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Figure 2 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Figure 3 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Figure 4 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Viaarxiv icon

A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection

Add code
May 11, 2022
Figure 1 for A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Figure 2 for A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Figure 3 for A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Figure 4 for A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Viaarxiv icon

Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection

Add code
May 10, 2022
Figure 1 for Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Figure 2 for Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Figure 3 for Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Viaarxiv icon

Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition

Add code
Jan 25, 2022
Figure 1 for Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
Figure 2 for Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
Figure 3 for Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
Figure 4 for Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
Viaarxiv icon

Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels

Add code
Sep 20, 2021
Figure 1 for Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels
Figure 2 for Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels
Figure 3 for Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels
Figure 4 for Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels
Viaarxiv icon

Recurrent Neural Network Transducer for Audio-Visual Speech Recognition

Add code
Nov 08, 2019
Figure 1 for Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Figure 2 for Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Figure 3 for Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Figure 4 for Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Viaarxiv icon