Picture for Mu Cai

Mu Cai

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Add code
Oct 15, 2024
Figure 1 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 2 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 3 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 4 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Viaarxiv icon

Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos

Add code
Oct 03, 2024
Figure 1 for Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos
Figure 2 for Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos
Figure 3 for Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos
Figure 4 for Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos
Viaarxiv icon

Removing Distributional Discrepancies in Captions Improves Image-Text Alignment

Add code
Oct 01, 2024
Figure 1 for Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
Figure 2 for Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
Figure 3 for Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
Figure 4 for Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
Viaarxiv icon

Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds

Add code
Sep 10, 2024
Figure 1 for Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds
Figure 2 for Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds
Figure 3 for Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds
Figure 4 for Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds
Viaarxiv icon

VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

Add code
Jul 15, 2024
Viaarxiv icon

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Add code
Jun 28, 2024
Figure 1 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Figure 2 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Figure 3 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Figure 4 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Viaarxiv icon

Yo'LLaVA: Your Personalized Language and Vision Assistant

Add code
Jun 13, 2024
Figure 1 for Yo'LLaVA: Your Personalized Language and Vision Assistant
Figure 2 for Yo'LLaVA: Your Personalized Language and Vision Assistant
Figure 3 for Yo'LLaVA: Your Personalized Language and Vision Assistant
Figure 4 for Yo'LLaVA: Your Personalized Language and Vision Assistant
Viaarxiv icon

Matryoshka Multimodal Models

Add code
May 27, 2024
Viaarxiv icon

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Add code
Apr 01, 2024
Viaarxiv icon

CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

Add code
Feb 20, 2024
Viaarxiv icon