Picture for Bingshen Mu

Bingshen Mu

LLM-ForcedAligner: A Non-Autoregressive and Accurate LLM-Based Forced Aligner for Multilingual and Long-Form Speech

Add code
Jan 26, 2026
Viaarxiv icon

dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition

Add code
Jan 25, 2026
Viaarxiv icon

WenetSpeech-Wu: Datasets, Benchmarks, and Models for a Unified Chinese Wu Dialect Speech Processing Ecosystem

Add code
Jan 16, 2026
Viaarxiv icon

Efficient Scaling for LLM-based ASR

Add code
Aug 06, 2025
Figure 1 for Efficient Scaling for LLM-based ASR
Figure 2 for Efficient Scaling for LLM-based ASR
Figure 3 for Efficient Scaling for LLM-based ASR
Figure 4 for Efficient Scaling for LLM-based ASR
Viaarxiv icon

Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR

Add code
May 28, 2025
Viaarxiv icon

HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models

Add code
Sep 30, 2024
Figure 1 for HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Figure 2 for HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Figure 3 for HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Figure 4 for HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Viaarxiv icon

Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

Add code
May 06, 2024
Figure 1 for Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Figure 2 for Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Figure 3 for Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Figure 4 for Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Viaarxiv icon

MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition

Add code
May 06, 2024
Figure 1 for MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Figure 2 for MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Figure 3 for MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Figure 4 for MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Viaarxiv icon

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

Add code
Jan 06, 2024
Figure 1 for E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models
Figure 2 for E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models
Figure 3 for E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models
Figure 4 for E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models
Viaarxiv icon

Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies

Add code
Dec 15, 2023
Viaarxiv icon