Picture for Wenyi Yu

Wenyi Yu

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation

Add code
Sep 25, 2024
Viaarxiv icon

Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement

Add code
Sep 15, 2024
Viaarxiv icon

HMDN: Hierarchical Multi-Distribution Network for Click-Through Rate Prediction

Add code
Aug 02, 2024
Viaarxiv icon

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Add code
Jun 22, 2024
Viaarxiv icon

Can Large Language Models Understand Spatial Audio?

Add code
Jun 12, 2024
Viaarxiv icon

An Improved Empirical Fisher Approximation for Natural Gradient Descent

Add code
Jun 10, 2024
Viaarxiv icon

M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

Add code
Mar 21, 2024
Viaarxiv icon

SALMONN: Towards Generic Hearing Abilities for Large Language Models

Add code
Oct 20, 2023
Viaarxiv icon

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

Add code
Oct 10, 2023
Viaarxiv icon

Connecting Speech Encoder and Large Language Model for ASR

Add code
Sep 26, 2023
Viaarxiv icon