Picture for Yuanhan Zhang

Yuanhan Zhang

Video Instruction Tuning With Synthetic Data

Add code
Oct 03, 2024
Figure 1 for Video Instruction Tuning With Synthetic Data
Figure 2 for Video Instruction Tuning With Synthetic Data
Figure 3 for Video Instruction Tuning With Synthetic Data
Figure 4 for Video Instruction Tuning With Synthetic Data
Viaarxiv icon

LLaVA-OneVision: Easy Visual Task Transfer

Add code
Aug 06, 2024
Viaarxiv icon

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Add code
Jul 17, 2024
Viaarxiv icon

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Add code
Jul 10, 2024
Viaarxiv icon

Long Context Transfer from Language to Vision

Add code
Jun 24, 2024
Viaarxiv icon

WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning

Add code
May 06, 2024
Viaarxiv icon

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

Add code
Apr 02, 2024
Figure 1 for Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Figure 2 for Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Figure 3 for Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Figure 4 for Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Viaarxiv icon

VBench: Comprehensive Benchmark Suite for Video Generative Models

Add code
Nov 29, 2023
Viaarxiv icon

OtterHD: A High-Resolution Multi-modality Model

Add code
Nov 07, 2023
Viaarxiv icon

Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images

Add code
Nov 02, 2023
Figure 1 for Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images
Figure 2 for Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images
Figure 3 for Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images
Figure 4 for Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images
Viaarxiv icon