Picture for Olivier Siohan

Olivier Siohan

Google Inc

On Robustness to Missing Video for Audiovisual Speech Recognition

Add code
Dec 19, 2023
Figure 1 for On Robustness to Missing Video for Audiovisual Speech Recognition
Figure 2 for On Robustness to Missing Video for Audiovisual Speech Recognition
Figure 3 for On Robustness to Missing Video for Audiovisual Speech Recognition
Figure 4 for On Robustness to Missing Video for Audiovisual Speech Recognition
Viaarxiv icon

Revisiting the Entropy Semiring for Neural Speech Recognition

Add code
Dec 19, 2023
Viaarxiv icon

Audio-visual fine-tuning of audio-only ASR models

Add code
Dec 14, 2023
Viaarxiv icon

Cascaded encoders for fine-tuning ASR models on overlapped speech

Add code
Jun 28, 2023
Viaarxiv icon

Conformers are All You Need for Visual Speech Recogntion

Add code
Feb 17, 2023
Viaarxiv icon

End-to-End Multi-Person Audio/Visual Automatic Speech Recognition

Add code
May 11, 2022
Figure 1 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Figure 2 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Figure 3 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Figure 4 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Viaarxiv icon

A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection

Add code
May 11, 2022
Figure 1 for A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Figure 2 for A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Figure 3 for A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Figure 4 for A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Viaarxiv icon

Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection

Add code
May 10, 2022
Figure 1 for Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Figure 2 for Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Figure 3 for Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Viaarxiv icon

End-to-end multi-talker audio-visual ASR using an active speaker attention module

Add code
Apr 01, 2022
Figure 1 for End-to-end multi-talker audio-visual ASR using an active speaker attention module
Figure 2 for End-to-end multi-talker audio-visual ASR using an active speaker attention module
Figure 3 for End-to-end multi-talker audio-visual ASR using an active speaker attention module
Figure 4 for End-to-end multi-talker audio-visual ASR using an active speaker attention module
Viaarxiv icon

Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition

Add code
Jan 25, 2022
Figure 1 for Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
Figure 2 for Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
Figure 3 for Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
Figure 4 for Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
Viaarxiv icon