Picture for Guoli Ye

Guoli Ye

Efficient Long-Form Speech Recognition for General Speech In-Context Learning

Add code
Sep 29, 2024
Figure 1 for Efficient Long-Form Speech Recognition for General Speech In-Context Learning
Figure 2 for Efficient Long-Form Speech Recognition for General Speech In-Context Learning
Figure 3 for Efficient Long-Form Speech Recognition for General Speech In-Context Learning
Figure 4 for Efficient Long-Form Speech Recognition for General Speech In-Context Learning
Viaarxiv icon

Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation

Add code
Sep 14, 2023
Figure 1 for Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation
Figure 2 for Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation
Figure 3 for Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation
Figure 4 for Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation
Viaarxiv icon

Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

Add code
Aug 03, 2023
Figure 1 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Figure 2 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Figure 3 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Figure 4 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Viaarxiv icon

Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding

Add code
Oct 16, 2022
Figure 1 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Figure 2 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Figure 3 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Figure 4 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Viaarxiv icon

Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition

Add code
Oct 10, 2021
Figure 1 for Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition
Figure 2 for Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition
Figure 3 for Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition
Figure 4 for Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition
Viaarxiv icon

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

Add code
Jun 04, 2021
Figure 1 for Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition
Figure 2 for Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition
Figure 3 for Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition
Viaarxiv icon

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone

Add code
Apr 12, 2021
Figure 1 for Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone
Figure 2 for Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone
Figure 3 for Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone
Viaarxiv icon

End-to-End Speaker-Attributed ASR with Transformer

Add code
Apr 05, 2021
Figure 1 for End-to-End Speaker-Attributed ASR with Transformer
Figure 2 for End-to-End Speaker-Attributed ASR with Transformer
Figure 3 for End-to-End Speaker-Attributed ASR with Transformer
Figure 4 for End-to-End Speaker-Attributed ASR with Transformer
Viaarxiv icon

Semantic Mask for Transformer based End-to-End Speech Recognition

Add code
Dec 06, 2019
Figure 1 for Semantic Mask for Transformer based End-to-End Speech Recognition
Figure 2 for Semantic Mask for Transformer based End-to-End Speech Recognition
Figure 3 for Semantic Mask for Transformer based End-to-End Speech Recognition
Figure 4 for Semantic Mask for Transformer based End-to-End Speech Recognition
Viaarxiv icon

Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units

Add code
Dec 31, 2018
Figure 1 for Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units
Figure 2 for Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units
Figure 3 for Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units
Figure 4 for Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units
Viaarxiv icon