Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ashish Gurung

How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responses

May 02, 2024

Jionghao Lin, Zifei Han, Danielle R. Thomas, Ashish Gurung, Shivang Gupta, Vincent Aleven, Kenneth R. Koedinger

Figure 1 for How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responses

Figure 2 for How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responses

Figure 3 for How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responses

Figure 4 for How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responses

Abstract:One-on-one tutoring is widely acknowledged as an effective instructional method, conditioned on qualified tutors. However, the high demand for qualified tutors remains a challenge, often necessitating the training of novice tutors (i.e., trainees) to ensure effective tutoring. Research suggests that providing timely explanatory feedback can facilitate the training process for trainees. However, it presents challenges due to the time-consuming nature of assessing trainee performance by human experts. Inspired by the recent advancements of large language models (LLMs), our study employed the GPT-4 model to build an explanatory feedback system. This system identifies trainees' responses in binary form (i.e., correct/incorrect) and automatically provides template-based feedback with responses appropriately rephrased by the GPT-4 model. We conducted our study on 410 responses from trainees across three training lessons: Giving Effective Praise, Reacting to Errors, and Determining What Students Know. Our findings indicate that: 1) using a few-shot approach, the GPT-4 model effectively identifies correct/incorrect trainees' responses from three training lessons with an average F1 score of 0.84 and an AUC score of 0.85; and 2) using the few-shot approach, the GPT-4 model adeptly rephrases incorrect trainees' responses into desired responses, achieving performance comparable to that of human experts.

* International Journal of Artificial Intelligence in Education

Via

Access Paper or Ask Questions

How Can I Improve? Using GPT to Highlight the Desired and Undesired Parts of Open-ended Responses

May 01, 2024

Jionghao Lin, Eason Chen, Zeifei Han, Ashish Gurung, Danielle R. Thomas, Wei Tan, Ngoc Dang Nguyen, Kenneth R. Koedinger

Figure 1 for How Can I Improve? Using GPT to Highlight the Desired and Undesired Parts of Open-ended Responses

Figure 2 for How Can I Improve? Using GPT to Highlight the Desired and Undesired Parts of Open-ended Responses

Figure 3 for How Can I Improve? Using GPT to Highlight the Desired and Undesired Parts of Open-ended Responses

Figure 4 for How Can I Improve? Using GPT to Highlight the Desired and Undesired Parts of Open-ended Responses

Abstract:Automated explanatory feedback systems play a crucial role in facilitating learning for a large cohort of learners by offering feedback that incorporates explanations, significantly enhancing the learning process. However, delivering such explanatory feedback in real-time poses challenges, particularly when high classification accuracy for domain-specific, nuanced responses is essential. Our study leverages the capabilities of large language models, specifically Generative Pre-Trained Transformers (GPT), to explore a sequence labeling approach focused on identifying components of desired and less desired praise for providing explanatory feedback within a tutor training dataset. Our aim is to equip tutors with actionable, explanatory feedback during online training lessons. To investigate the potential of GPT models for providing the explanatory feedback, we employed two commonly-used approaches: prompting and fine-tuning. To quantify the quality of highlighted praise components identified by GPT models, we introduced a Modified Intersection over Union (M-IoU) score. Our findings demonstrate that: (1) the M-IoU score effectively correlates with human judgment in evaluating sequence quality; (2) using two-shot prompting on GPT-3.5 resulted in decent performance in recognizing effort-based (M-IoU of 0.46) and outcome-based praise (M-IoU of 0.68); and (3) our optimally fine-tuned GPT-3.5 model achieved M-IoU scores of 0.64 for effort-based praise and 0.84 for outcome-based praise, aligning with the satisfaction levels evaluated by human coders. Our results show promise for using GPT models to provide feedback that focuses on specific elements in their open-ended responses that are desirable or could use improvement.

* 11 pages, full research paper, EDM 2024

Via

Access Paper or Ask Questions