Picture for Yuan Gong

Yuan Gong

State-Space Large Audio Language Models

Add code
Nov 24, 2024
Viaarxiv icon

A Closer Look at Neural Codec Resynthesis: Bridging the Gap between Codec and Waveform Generation

Add code
Oct 29, 2024
Viaarxiv icon

AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models

Add code
Sep 26, 2024
Figure 1 for AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models
Figure 2 for AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models
Figure 3 for AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models
Figure 4 for AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models
Viaarxiv icon

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Add code
Sep 17, 2024
Figure 1 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 2 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 3 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 4 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Viaarxiv icon

DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners

Add code
Jul 04, 2024
Viaarxiv icon

Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer

Add code
Jun 26, 2024
Figure 1 for Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer
Figure 2 for Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer
Figure 3 for Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer
Figure 4 for Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer
Viaarxiv icon

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Add code
Jun 14, 2024
Figure 1 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 2 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 3 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 4 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Viaarxiv icon

Generic Knowledge Boosted Pre-training For Remote Sensing Images

Add code
Jan 21, 2024
Figure 1 for Generic Knowledge Boosted Pre-training For Remote Sensing Images
Figure 2 for Generic Knowledge Boosted Pre-training For Remote Sensing Images
Figure 3 for Generic Knowledge Boosted Pre-training For Remote Sensing Images
Figure 4 for Generic Knowledge Boosted Pre-training For Remote Sensing Images
Viaarxiv icon

Joint Audio and Speech Understanding

Add code
Oct 02, 2023
Figure 1 for Joint Audio and Speech Understanding
Figure 2 for Joint Audio and Speech Understanding
Figure 3 for Joint Audio and Speech Understanding
Figure 4 for Joint Audio and Speech Understanding
Viaarxiv icon

Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning

Add code
Sep 19, 2023
Viaarxiv icon