Picture for Shiyu Zhou

Shiyu Zhou

Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire

Add code
Nov 17, 2022
Viaarxiv icon

Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection

Add code
Jan 30, 2022
Figure 1 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Figure 2 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Figure 3 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Figure 4 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Viaarxiv icon

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

Add code
Jul 06, 2021
Figure 1 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Figure 2 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Figure 3 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Figure 4 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Viaarxiv icon

Long-Running Speech Recognizer:An End-to-End Multi-Task Learning Framework for Online ASR and VAD

Add code
Mar 02, 2021
Figure 1 for Long-Running Speech Recognizer:An End-to-End Multi-Task Learning Framework for Online ASR and VAD
Figure 2 for Long-Running Speech Recognizer:An End-to-End Multi-Task Learning Framework for Online ASR and VAD
Figure 3 for Long-Running Speech Recognizer:An End-to-End Multi-Task Learning Framework for Online ASR and VAD
Figure 4 for Long-Running Speech Recognizer:An End-to-End Multi-Task Learning Framework for Online ASR and VAD
Viaarxiv icon

Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition

Add code
Jan 24, 2021
Figure 1 for Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
Figure 2 for Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
Figure 3 for Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
Figure 4 for Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
Viaarxiv icon

Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages

Add code
Jan 17, 2021
Figure 1 for Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages
Figure 2 for Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages
Figure 3 for Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages
Figure 4 for Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages
Viaarxiv icon

Exploring wav2vec 2.0 on speaker verification and language identification

Add code
Jan 14, 2021
Figure 1 for Exploring wav2vec 2.0 on speaker verification and language identification
Figure 2 for Exploring wav2vec 2.0 on speaker verification and language identification
Figure 3 for Exploring wav2vec 2.0 on speaker verification and language identification
Figure 4 for Exploring wav2vec 2.0 on speaker verification and language identification
Viaarxiv icon

cif-based collaborative decoding for end-to-end contextual speech recognition

Add code
Dec 17, 2020
Figure 1 for cif-based collaborative decoding for end-to-end contextual speech recognition
Figure 2 for cif-based collaborative decoding for end-to-end contextual speech recognition
Figure 3 for cif-based collaborative decoding for end-to-end contextual speech recognition
Figure 4 for cif-based collaborative decoding for end-to-end contextual speech recognition
Viaarxiv icon

Multi-output Gaussian Process Modulated Poisson Processes for Event Prediction

Add code
Nov 06, 2020
Figure 1 for Multi-output Gaussian Process Modulated Poisson Processes for Event Prediction
Figure 2 for Multi-output Gaussian Process Modulated Poisson Processes for Event Prediction
Figure 3 for Multi-output Gaussian Process Modulated Poisson Processes for Event Prediction
Figure 4 for Multi-output Gaussian Process Modulated Poisson Processes for Event Prediction
Viaarxiv icon

A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition

Add code
May 25, 2020
Figure 1 for A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Figure 2 for A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Figure 3 for A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Figure 4 for A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Viaarxiv icon