Picture for Yongshuo Zong

Yongshuo Zong

Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

Add code
Jun 18, 2024
Figure 1 for Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Figure 2 for Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Figure 3 for Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Figure 4 for Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Viaarxiv icon

VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning

Add code
Mar 19, 2024
Viaarxiv icon

Safety Fine-Tuning at No Cost: A Baseline for Vision Large Language Models

Add code
Feb 03, 2024
Viaarxiv icon

What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models

Add code
Oct 10, 2023
Viaarxiv icon

Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations

Add code
Oct 02, 2023
Viaarxiv icon

Meta Omnium: A Benchmark for General-Purpose Learning-to-Learn

Add code
May 12, 2023
Viaarxiv icon

Self-Supervised Multimodal Learning: A Survey

Add code
Mar 31, 2023
Viaarxiv icon

MEDFAIR: Benchmarking Fairness for Medical Imaging

Add code
Oct 04, 2022
Figure 1 for MEDFAIR: Benchmarking Fairness for Medical Imaging
Figure 2 for MEDFAIR: Benchmarking Fairness for Medical Imaging
Figure 3 for MEDFAIR: Benchmarking Fairness for Medical Imaging
Figure 4 for MEDFAIR: Benchmarking Fairness for Medical Imaging
Viaarxiv icon