Picture for Yuchen Hu

Yuchen Hu

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Add code
Sep 17, 2024
Figure 1 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 2 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 3 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 4 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Viaarxiv icon

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis

Add code
Sep 11, 2024
Figure 1 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Figure 2 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Figure 3 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Figure 4 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Viaarxiv icon

MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models

Add code
Aug 07, 2024
Figure 1 for MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models
Figure 2 for MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models
Figure 3 for MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models
Figure 4 for MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models
Viaarxiv icon

Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization

Add code
Jul 02, 2024
Figure 1 for Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Figure 2 for Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Figure 3 for Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Figure 4 for Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Viaarxiv icon

Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback

Add code
Jun 02, 2024
Viaarxiv icon

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

Add code
May 23, 2024
Viaarxiv icon

Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models

Add code
May 16, 2024
Viaarxiv icon

Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?

Add code
Apr 19, 2024
Figure 1 for Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?
Figure 2 for Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?
Figure 3 for Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?
Figure 4 for Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?
Viaarxiv icon

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

Add code
Feb 10, 2024
Figure 1 for GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
Figure 2 for GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
Figure 3 for GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
Figure 4 for GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
Viaarxiv icon

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

Add code
Feb 08, 2024
Viaarxiv icon