Picture for Guangzhi Sun

Guangzhi Sun

SkillAggregation: Reference-free LLM-Dependent Aggregation

Add code
Oct 14, 2024
Viaarxiv icon

Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization

Add code
Oct 09, 2024
Figure 1 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Figure 2 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Figure 3 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Figure 4 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Viaarxiv icon

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation

Add code
Sep 25, 2024
Viaarxiv icon

Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models

Add code
Sep 17, 2024
Viaarxiv icon

Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement

Add code
Sep 15, 2024
Viaarxiv icon

Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models

Add code
Aug 28, 2024
Viaarxiv icon

Speaker Adaptation for Quantised End-to-End ASR Models

Add code
Aug 07, 2024
Viaarxiv icon

SOT Triggered Neural Clustering for Speaker Attributed ASR

Add code
Jul 02, 2024
Viaarxiv icon

SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR

Add code
Jun 28, 2024
Viaarxiv icon

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Add code
Jun 22, 2024
Viaarxiv icon