Picture for Linhao Dong

Linhao Dong

NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training

Add code
Sep 13, 2024
Figure 1 for NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
Figure 2 for NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
Figure 3 for NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
Figure 4 for NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
Viaarxiv icon

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

Add code
Jul 05, 2024
Viaarxiv icon

SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR

Add code
Mar 04, 2024
Viaarxiv icon

CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training

Add code
May 27, 2023
Viaarxiv icon

Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire

Add code
Nov 17, 2022
Viaarxiv icon

Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire

Add code
Jun 27, 2022
Figure 1 for Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire
Figure 2 for Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire
Figure 3 for Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire
Figure 4 for Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire
Viaarxiv icon

Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection

Add code
Jan 30, 2022
Figure 1 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Figure 2 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Figure 3 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Figure 4 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Viaarxiv icon

cif-based collaborative decoding for end-to-end contextual speech recognition

Add code
Dec 17, 2020
Figure 1 for cif-based collaborative decoding for end-to-end contextual speech recognition
Figure 2 for cif-based collaborative decoding for end-to-end contextual speech recognition
Figure 3 for cif-based collaborative decoding for end-to-end contextual speech recognition
Figure 4 for cif-based collaborative decoding for end-to-end contextual speech recognition
Viaarxiv icon

A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition

Add code
May 25, 2020
Figure 1 for A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Figure 2 for A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Figure 3 for A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Figure 4 for A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Viaarxiv icon

CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition

Add code
May 27, 2019
Figure 1 for CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
Figure 2 for CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
Figure 3 for CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
Figure 4 for CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
Viaarxiv icon