Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Adriana Caraeni

Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams

Nov 07, 2024

Adriana Caraeni, Alexander Scarlatos, Andrew Lan

Figure 1 for Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams

Figure 2 for Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams

Abstract:Recent advances in generative artificial intelligence (AI) have shown promise in accurately grading open-ended student responses. However, few prior works have explored grading handwritten responses due to a lack of data and the challenge of combining visual and textual information. In this work, we leverage state-of-the-art multi-modal AI models, in particular GPT-4o, to automatically grade handwritten responses to college-level math exams. Using real student responses to questions in a probability theory exam, we evaluate GPT-4o's alignment with ground-truth scores from human graders using various prompting techniques. We find that while providing rubrics improves alignment, the model's overall accuracy is still too low for real-world settings, showing there is significant room for growth in this task.

Via

Access Paper or Ask Questions