Picture for Minglun Han

Minglun Han

NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training

Add code
Sep 13, 2024
Figure 1 for NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
Figure 2 for NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
Figure 3 for NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
Figure 4 for NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
Viaarxiv icon

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

Add code
Jul 05, 2024
Viaarxiv icon

ViLaS: Integrating Vision and Language into Automatic Speech Recognition

Add code
May 31, 2023
Figure 1 for ViLaS: Integrating Vision and Language into Automatic Speech Recognition
Figure 2 for ViLaS: Integrating Vision and Language into Automatic Speech Recognition
Figure 3 for ViLaS: Integrating Vision and Language into Automatic Speech Recognition
Figure 4 for ViLaS: Integrating Vision and Language into Automatic Speech Recognition
Viaarxiv icon

X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

Add code
May 10, 2023
Figure 1 for X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Figure 2 for X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Figure 3 for X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Figure 4 for X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Viaarxiv icon

Matching-based Term Semantics Pre-training for Spoken Patient Query Understanding

Add code
Mar 02, 2023
Viaarxiv icon

Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition

Add code
Feb 02, 2023
Viaarxiv icon

Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

Add code
Jan 30, 2023
Figure 1 for Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation
Figure 2 for Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation
Figure 3 for Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation
Figure 4 for Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation
Viaarxiv icon

VLP: A Survey on Vision-Language Pre-training

Add code
Feb 21, 2022
Figure 1 for VLP: A Survey on Vision-Language Pre-training
Figure 2 for VLP: A Survey on Vision-Language Pre-training
Figure 3 for VLP: A Survey on Vision-Language Pre-training
Viaarxiv icon

Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection

Add code
Jan 30, 2022
Figure 1 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Figure 2 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Figure 3 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Figure 4 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Viaarxiv icon

cif-based collaborative decoding for end-to-end contextual speech recognition

Add code
Dec 17, 2020
Figure 1 for cif-based collaborative decoding for end-to-end contextual speech recognition
Figure 2 for cif-based collaborative decoding for end-to-end contextual speech recognition
Figure 3 for cif-based collaborative decoding for end-to-end contextual speech recognition
Figure 4 for cif-based collaborative decoding for end-to-end contextual speech recognition
Viaarxiv icon