Picture for Yutong Bai

Yutong Bai

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

Add code
Dec 03, 2024
Viaarxiv icon

Analyzing The Language of Visual Tokens

Add code
Nov 07, 2024
Figure 1 for Analyzing The Language of Visual Tokens
Figure 2 for Analyzing The Language of Visual Tokens
Figure 3 for Analyzing The Language of Visual Tokens
Figure 4 for Analyzing The Language of Visual Tokens
Viaarxiv icon

Evaluating Multiview Object Consistency in Humans and Image Models

Add code
Sep 10, 2024
Viaarxiv icon

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

Add code
Jul 25, 2024
Viaarxiv icon

LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning

Add code
Jun 17, 2024
Figure 1 for LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Figure 2 for LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Figure 3 for LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Figure 4 for LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Viaarxiv icon

Finding Visual Task Vectors

Add code
Apr 08, 2024
Viaarxiv icon

Sequential Modeling Enables Scalable Learning for Large Vision Models

Add code
Dec 01, 2023
Figure 1 for Sequential Modeling Enables Scalable Learning for Large Vision Models
Figure 2 for Sequential Modeling Enables Scalable Learning for Large Vision Models
Figure 3 for Sequential Modeling Enables Scalable Learning for Large Vision Models
Figure 4 for Sequential Modeling Enables Scalable Learning for Large Vision Models
Viaarxiv icon

Understanding Pan-Sharpening via Generalized Inverse

Add code
Oct 04, 2023
Viaarxiv icon

Intriguing Properties of Text-guided Diffusion Models

Add code
Jun 18, 2023
Viaarxiv icon

Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification

Add code
Oct 23, 2022
Viaarxiv icon