Picture for Qirui Jiao

Qirui Jiao

HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data

Add code
Dec 23, 2024
Viaarxiv icon

Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models

Add code
Aug 09, 2024
Figure 1 for Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Figure 2 for Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Figure 3 for Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Figure 4 for Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Viaarxiv icon

Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study

Add code
Jan 31, 2024
Viaarxiv icon