Picture for Chunyuan Li

Chunyuan Li

Video Instruction Tuning With Synthetic Data

Add code
Oct 03, 2024
Viaarxiv icon

LLaVA-Critic: Learning to Evaluate Multimodal Models

Add code
Oct 03, 2024
Viaarxiv icon

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

Add code
Aug 29, 2024
Figure 1 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Figure 2 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Figure 3 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Figure 4 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Viaarxiv icon

LLaVA-OneVision: Easy Visual Task Transfer

Add code
Aug 06, 2024
Viaarxiv icon

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Add code
Jul 17, 2024
Viaarxiv icon

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Add code
Jul 10, 2024
Viaarxiv icon

Long Context Transfer from Language to Vision

Add code
Jun 24, 2024
Viaarxiv icon

Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model

Add code
Jun 15, 2024
Viaarxiv icon

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Add code
Jun 13, 2024
Figure 1 for MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Figure 2 for MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Figure 3 for MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Figure 4 for MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Viaarxiv icon

Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

Add code
May 28, 2024
Viaarxiv icon