Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Darnell Granberry

A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams

Jun 11, 2022

Sarah Zhang, Reece Shuttleworth, Derek Austin, Yann Hicke, Leonard Tang, Sathwik Karnik, Darnell Granberry, Iddo Drori

Figure 1 for A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams

Figure 2 for A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams

Figure 3 for A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams

Figure 4 for A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams

Abstract:Can a machine learn machine learning? We propose to answer this question using the same criteria we use to answer a similar question: can a human learn machine learning? We automatically answer MIT final exams in Introduction to Machine Learning at a human level. The course is a large undergraduate class with around five hundred students each semester. Recently, program synthesis and few-shot learning solved university-level problem set questions in mathematics and STEM courses at a human level. In this work, we solve questions from final exams that differ from problem sets in several ways: the questions are longer, have multiple parts, are more complicated, and span a broader set of topics. We provide a new dataset and benchmark of questions from eight MIT Introduction to Machine Learning final exams between Fall 2017 and Spring 2022 and provide code for automatically answering these questions and generating new questions. We perform ablation studies comparing zero-shot learning with few-shot learning, chain-of-thought prompting, GPT-3 pre-trained on text and Codex fine-tuned on code on a range of machine learning topics and find that few-shot learning methods perform best. We make our data and code publicly available for the machine learning community.

* 17 pages

Via

Access Paper or Ask Questions