Picture for Changli Tang

Changli Tang

Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization

Add code
Oct 09, 2024
Figure 1 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Figure 2 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Figure 3 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Figure 4 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Viaarxiv icon

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation

Add code
Sep 25, 2024
Figure 1 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Figure 2 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Figure 3 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Figure 4 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Viaarxiv icon

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Add code
Jun 22, 2024
Viaarxiv icon

Can Large Language Models Understand Spatial Audio?

Add code
Jun 12, 2024
Viaarxiv icon

SALMONN: Towards Generic Hearing Abilities for Large Language Models

Add code
Oct 20, 2023
Figure 1 for SALMONN: Towards Generic Hearing Abilities for Large Language Models
Figure 2 for SALMONN: Towards Generic Hearing Abilities for Large Language Models
Figure 3 for SALMONN: Towards Generic Hearing Abilities for Large Language Models
Figure 4 for SALMONN: Towards Generic Hearing Abilities for Large Language Models
Viaarxiv icon

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

Add code
Oct 10, 2023
Viaarxiv icon

Connecting Speech Encoder and Large Language Model for ASR

Add code
Sep 26, 2023
Figure 1 for Connecting Speech Encoder and Large Language Model for ASR
Figure 2 for Connecting Speech Encoder and Large Language Model for ASR
Figure 3 for Connecting Speech Encoder and Large Language Model for ASR
Figure 4 for Connecting Speech Encoder and Large Language Model for ASR
Viaarxiv icon

Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition

Add code
Feb 18, 2023
Viaarxiv icon

Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models

Add code
Dec 20, 2022
Viaarxiv icon

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets

Add code
Nov 14, 2022
Figure 1 for MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Figure 2 for MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Figure 3 for MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Figure 4 for MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Viaarxiv icon