Picture for Yuan Gong

Yuan Gong

A Closer Look at Neural Codec Resynthesis: Bridging the Gap between Codec and Waveform Generation

Add code
Oct 29, 2024
Viaarxiv icon

AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models

Add code
Sep 26, 2024
Viaarxiv icon

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Add code
Sep 17, 2024
Figure 1 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 2 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 3 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 4 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Viaarxiv icon

DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners

Add code
Jul 04, 2024
Viaarxiv icon

Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer

Add code
Jun 26, 2024
Figure 1 for Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer
Figure 2 for Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer
Figure 3 for Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer
Figure 4 for Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer
Viaarxiv icon

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Add code
Jun 14, 2024
Viaarxiv icon

Generic Knowledge Boosted Pre-training For Remote Sensing Images

Add code
Jan 21, 2024
Viaarxiv icon

Joint Audio and Speech Understanding

Add code
Oct 02, 2023
Viaarxiv icon

Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning

Add code
Sep 19, 2023
Viaarxiv icon

ToonTalker: Cross-Domain Face Reenactment

Add code
Aug 24, 2023
Viaarxiv icon