Picture for Yubo Wang

Yubo Wang

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

Add code
Dec 06, 2024
Viaarxiv icon

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Add code
Oct 14, 2024
Viaarxiv icon

Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models

Add code
Oct 09, 2024
Viaarxiv icon

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Add code
Sep 04, 2024
Figure 1 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Figure 2 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Figure 3 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Figure 4 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Viaarxiv icon

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Add code
Jun 20, 2024
Figure 1 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 2 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 3 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 4 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Viaarxiv icon

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Add code
Jun 04, 2024
Figure 1 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Figure 2 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Figure 3 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Figure 4 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Viaarxiv icon

KGLink: A column type annotation method that combines knowledge graph and pre-trained language model

Add code
Jun 01, 2024
Viaarxiv icon

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Add code
May 29, 2024
Figure 1 for MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Figure 2 for MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Figure 3 for MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Figure 4 for MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Viaarxiv icon

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

Add code
May 14, 2024
Figure 1 for The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
Figure 2 for The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
Figure 3 for The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
Figure 4 for The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
Viaarxiv icon

UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence

Add code
May 11, 2024
Viaarxiv icon