Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

Jan 26, 2024

Zheqi He, Xinya Wu, Pengfei Zhou, Richeng Xuan, Guang Liu, Xi Yang, Qiannan Zhu, Hua Huang

Figure 1 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

Figure 2 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

Figure 3 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

Figure 4 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

Share this with someone who'll enjoy it:

Abstract:Multi-modal large language models(MLLMs) have achieved remarkable progress and demonstrated powerful knowledge comprehension and reasoning abilities. However, the mastery of domain-specific knowledge, which is essential for evaluating the intelligence of MLLMs, continues to be a challenge. Current multi-modal benchmarks for domain-specific knowledge concentrate on multiple-choice questions and are predominantly available in English, which imposes limitations on the comprehensiveness of the evaluation. To this end, we introduce CMMU, a novel benchmark for multi-modal and multi-type question understanding and reasoning in Chinese. CMMU consists of 3,603 questions in 7 subjects, covering knowledge from primary to high school. The questions can be categorized into 3 types: multiple-choice, multiple-response, and fill-in-the-blank, bringing greater challenges to MLLMs. In addition, we propose a rigorous evaluation strategy called ShiftCheck for assessing multiple-choice questions. The strategy aims to reduce position bias, minimize the influence of randomness on correctness, and perform a quantitative analysis of position bias. We evaluate seven open-source MLLMs along with GPT4-V, Gemini-Pro, and Qwen-VL-Plus. The results demonstrate that CMMU poses a significant challenge to the recent MLLMs.

View paper on

Share this with someone who'll enjoy it:

Title:CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

Paper and Code