Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation

Feb 24, 2024

Yi Zong, Xipeng Qiu

Figure 1 for GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation

Figure 2 for GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation

Figure 3 for GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation

Figure 4 for GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation

Share this with someone who'll enjoy it:

Abstract:The Large Vision-Language Models (LVLMs) have demonstrated great abilities in image perception and language understanding. However, existing multimodal benchmarks focus on primary perception abilities and commonsense knowledge which are insufficient to reflect the comprehensive capabilities of LVLMs. We propose GAOKAO-MM, a multimodal benchmark based on the Chinese College Entrance Examination (GAOKAO), comprising of 8 subjects and 12 types of images, such as diagrams, function graphs, maps and photos. GAOKAO-MM derives from native Chinese context and sets human-level requirements for the model's abilities, including perception, understanding, knowledge and reasoning. We evaluate 10 LVLMs and find that the accuracies of all of them are lower than 50%, with GPT-4-Vison (48.1%), Qwen-VL-Plus (41.2%) and Gemini-Pro-Vision (35.1%) ranking in the top three positions. The results of our multi-dimension analysis indicate that LVLMs have moderate distance towards Artificial General Intelligence (AGI) and provide insights facilitating the development of multilingual LVLMs.

View paper on

Share this with someone who'll enjoy it:

Title:GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation

Paper and Code