Picture for Jimin Zhuang

Jimin Zhuang

video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Add code
Feb 17, 2025
Viaarxiv icon

Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization

Add code
Oct 09, 2024
Figure 1 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Figure 2 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Figure 3 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Figure 4 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Viaarxiv icon

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation

Add code
Sep 25, 2024
Figure 1 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Figure 2 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Figure 3 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Figure 4 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Viaarxiv icon