Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuang Dong

CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?

Jun 29, 2023

Tianwen Wei, Jian Luan, Wei Liu, Shuang Dong, Bin Wang

Figure 1 for CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?

Figure 2 for CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?

Figure 3 for CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?

Figure 4 for CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?

Abstract:We present the Chinese Elementary School Math Word Problems (CMATH) dataset, comprising 1.7k elementary school-level math word problems with detailed annotations, source from actual Chinese workbooks and exams. This dataset aims to provide a benchmark tool for assessing the following question: to what grade level of elementary school math do the abilities of popular large language models (LLMs) correspond? We evaluate a variety of popular LLMs, including both commercial and open-source options, and discover that only GPT-4 achieves success (accuracy $\geq$ 60\%) across all six elementary school grades, while other models falter at different grade levels. Furthermore, we assess the robustness of several top-performing LLMs by augmenting the original problems in the CMATH dataset with distracting information. Our findings reveal that GPT-4 is able to maintains robustness, while other model fail. We anticipate that our study will expose limitations in LLMs' arithmetic and reasoning capabilities, and promote their ongoing development and advancement.

Via

Access Paper or Ask Questions