Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Paul Denny

University of Auckland

From Prompts to Propositions: A Logic-Based Lens on Student-LLM Interactions

Apr 25, 2025

Ali Alfageeh, Sadegh AlMahdi Kazemi Zarkouei, Daye Nam, Daniel Prol, Matin Amoozadeh, Souti Chattopadhyay, James Prather, Paul Denny, Juho Leinonen, Michael Hilton(+2 more)

Abstract:Background and Context. The increasing integration of large language models (LLMs) in computing education presents an emerging challenge in understanding how students use LLMs and craft prompts to solve computational tasks. Prior research has used both qualitative and quantitative methods to analyze prompting behavior, but these approaches lack scalability or fail to effectively capture the semantic evolution of prompts. Objective. In this paper, we investigate whether students prompts can be systematically analyzed using propositional logic constraints. We examine whether this approach can identify patterns in prompt evolution, detect struggling students, and provide insights into effective and ineffective strategies. Method. We introduce Prompt2Constraints, a novel method that translates students prompts into logical constraints. The constraints are able to represent the intent of the prompts in succinct and quantifiable ways. We used this approach to analyze a dataset of 1,872 prompts from 203 students solving introductory programming tasks. Findings. We find that while successful and unsuccessful attempts tend to use a similar number of constraints overall, when students fail, they often modify their prompts more significantly, shifting problem-solving strategies midway. We also identify points where specific interventions could be most helpful to students for refining their prompts. Implications. This work offers a new and scalable way to detect students who struggle in solving natural language programming tasks. This work could be extended to investigate more complex tasks and integrated into programming tools to provide real-time support.

Via

Access Paper or Ask Questions

Prompt Programming: A Platform for Dialogue-based Computational Problem Solving with Generative AI Models

Mar 06, 2025

Victor-Alexandru Pădurean, Paul Denny, Alkis Gotovos, Adish Singla

Abstract:Computing students increasingly rely on generative AI tools for programming assistance, often without formal instruction or guidance. This highlights a need to teach students how to effectively interact with AI models, particularly through natural language prompts, to generate and critically evaluate code for solving computational tasks. To address this, we developed a novel platform for prompt programming that enables authentic dialogue-based interactions, supports problems involving multiple interdependent functions, and offers on-request execution of generated code. Data analysis from over 900 students in an introductory programming course revealed high engagement, with the majority of prompts occurring within multi-turn dialogues. Problems with multiple interdependent functions encouraged iterative refinement, with progression graphs highlighting several common strategies. Students were highly selective about the code they chose to test, suggesting that on-request execution of generated code promoted critical thinking. Given the growing importance of learning dialogue-based programming with AI, we provide this tool as a publicly accessible resource, accompanied by a corpus of programming problems for educational use.

* Preprint of the ITiCSE'25 paper

Via

Access Paper or Ask Questions

Breaking the Programming Language Barrier: Multilingual Prompting to Empower Non-Native English Learners

Dec 17, 2024

James Prather, Brent N. Reeves, Paul Denny, Juho Leinonen, Stephen MacNeil, Andrew Luxton-Reilly, João Orvalho, Amin Alipour, Ali Alfageeh, Thezyrie Amarouche(+4 more)

Figure 1 for Breaking the Programming Language Barrier: Multilingual Prompting to Empower Non-Native English Learners

Figure 2 for Breaking the Programming Language Barrier: Multilingual Prompting to Empower Non-Native English Learners

Figure 3 for Breaking the Programming Language Barrier: Multilingual Prompting to Empower Non-Native English Learners

Abstract:Non-native English speakers (NNES) face multiple barriers to learning programming. These barriers can be obvious, such as the fact that programming language syntax and instruction are often in English, or more subtle, such as being afraid to ask for help in a classroom full of native English speakers. However, these barriers are frustrating because many NNES students know more about programming than they can articulate in English. Advances in generative AI (GenAI) have the potential to break down these barriers because state of the art models can support interactions in multiple languages. Moreover, recent work has shown that GenAI can be highly accurate at code generation and explanation. In this paper, we provide the first exploration of NNES students prompting in their native languages (Arabic, Chinese, and Portuguese) to generate code to solve programming problems. Our results show that students are able to successfully use their native language to solve programming problems, but not without some difficulty specifying programming terminology and concepts. We discuss the challenges they faced, the implications for practice in the short term, and how this might transform computing education globally in the long term.

* 10 pages, 3 tables. Accepted for publication at the 27th Australasian Computing Education Conference (ACE 2025)

Via

Access Paper or Ask Questions

Seeing the Forest and the Trees: Solving Visual Graph and Tree Based Data Structure Problems using Large Multimodal Models

Dec 15, 2024

Sebastian Gutierrez, Irene Hou, Jihye Lee, Kenneth Angelikas, Owen Man, Sophia Mettille, James Prather, Paul Denny, Stephen MacNeil

Abstract:Recent advancements in generative AI systems have raised concerns about academic integrity among educators. Beyond excelling at solving programming problems and text-based multiple-choice questions, recent research has also found that large multimodal models (LMMs) can solve Parsons problems based only on an image. However, such problems are still inherently text-based and rely on the capabilities of the models to convert the images of code blocks to their corresponding text. In this paper, we further investigate the capabilities of LMMs to solve graph and tree data structure problems based only on images. To achieve this, we computationally construct and evaluate a novel benchmark dataset comprising 9,072 samples of diverse graph and tree data structure tasks to assess the performance of the GPT-4o, GPT-4v, Gemini 1.5 Pro, Gemini 1.5 Flash, Gemini 1.0 Pro Vision, and Claude 3 model families. GPT-4o and Gemini 1.5 Flash performed best on trees and graphs respectively. GPT-4o achieved 87.6% accuracy on tree samples, while Gemini 1.5 Flash, achieved 56.2% accuracy on graph samples. Our findings highlight the influence of structural and visual variations on model performance. This research not only introduces an LMM benchmark to facilitate replication and further exploration but also underscores the potential of LMMs in solving complex computing problems, with important implications for pedagogy and assessment practices.

* 14 pages, 4 figures, to be published in ACE 2025

Via

Access Paper or Ask Questions

BugSpotter: Automated Generation of Code Debugging Exercises

Nov 25, 2024

Victor-Alexandru Pădurean, Paul Denny, Adish Singla

Figure 1 for BugSpotter: Automated Generation of Code Debugging Exercises

Figure 2 for BugSpotter: Automated Generation of Code Debugging Exercises

Figure 3 for BugSpotter: Automated Generation of Code Debugging Exercises

Figure 4 for BugSpotter: Automated Generation of Code Debugging Exercises

Abstract:Debugging is an essential skill when learning to program, yet its instruction and emphasis often vary widely across introductory courses. In the era of code-generating large language models (LLMs), the ability for students to reason about code and identify errors is increasingly important. However, students frequently resort to trial-and-error methods to resolve bugs without fully understanding the underlying issues. Developing the ability to identify and hypothesize the cause of bugs is crucial but can be time-consuming to teach effectively through traditional means. This paper introduces BugSpotter, an innovative tool that leverages an LLM to generate buggy code from a problem description and verify the synthesized bugs via a test suite. Students interact with BugSpotter by designing failing test cases, where the buggy code's output differs from the expected result as defined by the problem specification. This not only provides opportunities for students to enhance their debugging skills, but also to practice reading and understanding problem specifications. We deployed BugSpotter in a large classroom setting and compared the debugging exercises it generated to exercises hand-crafted by an instructor for the same problems. We found that the LLM-generated exercises produced by BugSpotter varied in difficulty and were well-matched to the problem specifications. Importantly, the LLM-generated exercises were comparable to those manually created by instructors with respect to student performance, suggesting that BugSpotter could be an effective and efficient aid for learning debugging.

* Preprint of the SIGCSE'25 paper

Via

Access Paper or Ask Questions

Automated Generation of Code Debugging Exercises

Nov 21, 2024

Victor-Alexandru Pădurean, Paul Denny, Adish Singla

Figure 1 for Automated Generation of Code Debugging Exercises

Figure 2 for Automated Generation of Code Debugging Exercises

Figure 3 for Automated Generation of Code Debugging Exercises

Figure 4 for Automated Generation of Code Debugging Exercises

* Preprint of the SIGCSE'25 paper

Via

Access Paper or Ask Questions

Automating Autograding: Large Language Models as Test Suite Generators for Introductory Programming

Nov 14, 2024

Umar Alkafaween, Ibrahim Albluwi, Paul Denny

Figure 1 for Automating Autograding: Large Language Models as Test Suite Generators for Introductory Programming

Figure 2 for Automating Autograding: Large Language Models as Test Suite Generators for Introductory Programming

Figure 3 for Automating Autograding: Large Language Models as Test Suite Generators for Introductory Programming

Figure 4 for Automating Autograding: Large Language Models as Test Suite Generators for Introductory Programming

Abstract:Automatically graded programming assignments provide instant feedback to students and significantly reduce manual grading time for instructors. However, creating comprehensive suites of test cases for programming problems within automatic graders can be time-consuming and complex. The effort needed to define test suites may deter some instructors from creating additional problems or lead to inadequate test coverage, potentially resulting in misleading feedback on student solutions. Such limitations may reduce student access to the well-documented benefits of timely feedback when learning programming. In this work, we evaluate the effectiveness of using Large Language Models (LLMs), as part of a larger workflow, to automatically generate test suites for CS1-level programming problems. Each problem's statement and reference solution are provided to GPT-4 to produce a test suite that can be used by an autograder. We evaluate our proposed approach using a sample of 26 problems, and more than 25,000 attempted solutions to those problems, submitted by students in an introductory programming course. We compare the performance of the LLM-generated test suites against the instructor-created test suites for each problem. Our findings reveal that LLM-generated test suites can correctly identify most valid solutions, and for most problems are at least as comprehensive as the instructor test suites. Additionally, the LLM-generated test suites exposed ambiguities in some problem statements, underscoring their potential to improve both autograding and instructional design.

* Submitted to Journal of Computer Assisted Learning

Via

Access Paper or Ask Questions

An Eye for an AI: Evaluating GPT-4o's Visual Perception Skills and Geometric Reasoning Skills Using Computer Graphics Questions

Oct 22, 2024

Tony Haoran Feng, Paul Denny, Burkhard C. Wünsche, Andrew Luxton-Reilly, Jacqueline Whalley

Abstract:CG (Computer Graphics) is a popular field of CS (Computer Science), but many students find this topic difficult due to it requiring a large number of skills, such as mathematics, programming, geometric reasoning, and creativity. Over the past few years, researchers have investigated ways to harness the power of GenAI (Generative Artificial Intelligence) to improve teaching. In CS, much of the research has focused on introductory computing. A recent study evaluating the performance of an LLM (Large Language Model), GPT-4 (text-only), on CG questions, indicated poor performance and reliance on detailed descriptions of image content, which often required considerable insight from the user to return reasonable results. So far, no studies have investigated the abilities of LMMs (Large Multimodal Models), or multimodal LLMs, to solve CG questions and how these abilities can be used to improve teaching. In this study, we construct two datasets of CG questions requiring varying degrees of visual perception skills and geometric reasoning skills, and evaluate the current state-of-the-art LMM, GPT-4o, on the two datasets. We find that although GPT-4o exhibits great potential in solving questions with visual information independently, major limitations still exist to the accuracy and quality of the generated results. We propose several novel approaches for CG educators to incorporate GenAI into CG teaching despite these limitations. We hope that our guidelines further encourage learning and engagement in CG classrooms.

* 8 pages, 8 figures, 1 table, to be published in SIGGRAPH Asia 2024 Educator's Forum

Via

Access Paper or Ask Questions

Synthetic Students: A Comparative Study of Bug Distribution Between Large Language Models and Computing Students

Oct 11, 2024

Stephen MacNeil, Magdalena Rogalska, Juho Leinonen, Paul Denny, Arto Hellas, Xandria Crosland

Figure 1 for Synthetic Students: A Comparative Study of Bug Distribution Between Large Language Models and Computing Students

Figure 2 for Synthetic Students: A Comparative Study of Bug Distribution Between Large Language Models and Computing Students

Figure 3 for Synthetic Students: A Comparative Study of Bug Distribution Between Large Language Models and Computing Students

Figure 4 for Synthetic Students: A Comparative Study of Bug Distribution Between Large Language Models and Computing Students

Abstract:Large language models (LLMs) present an exciting opportunity for generating synthetic classroom data. Such data could include code containing a typical distribution of errors, simulated student behaviour to address the cold start problem when developing education tools, and synthetic user data when access to authentic data is restricted due to privacy reasons. In this research paper, we conduct a comparative study examining the distribution of bugs generated by LLMs in contrast to those produced by computing students. Leveraging data from two previous large-scale analyses of student-generated bugs, we investigate whether LLMs can be coaxed to exhibit bug patterns that are similar to authentic student bugs when prompted to inject errors into code. The results suggest that unguided, LLMs do not generate plausible error distributions, and many of the generated errors are unlikely to be generated by real students. However, with guidance including descriptions of common errors and typical frequencies, LLMs can be shepherded to generate realistic distributions of errors in synthetic code.

Via

Access Paper or Ask Questions

Integrating Natural Language Prompting Tasks in Introductory Programming Courses

Oct 04, 2024

Chris Kerslake, Paul Denny, David H Smith IV, James Prather, Juho Leinonen, Andrew Luxton-Reilly, Stephen MacNeil

Figure 1 for Integrating Natural Language Prompting Tasks in Introductory Programming Courses

Figure 2 for Integrating Natural Language Prompting Tasks in Introductory Programming Courses

Figure 3 for Integrating Natural Language Prompting Tasks in Introductory Programming Courses

Figure 4 for Integrating Natural Language Prompting Tasks in Introductory Programming Courses

Abstract:Introductory programming courses often emphasize mastering syntax and basic constructs before progressing to more complex and interesting programs. This bottom-up approach can be frustrating for novices, shifting the focus away from problem solving and potentially making computing less appealing to a broad range of students. The rise of generative AI for code production could partially address these issues by fostering new skills via interaction with AI models, including constructing high-level prompts and evaluating code that is automatically generated. In this experience report, we explore the inclusion of two prompt-focused activities in an introductory course, implemented across four labs in a six-week module. The first requires students to solve computational problems by writing natural language prompts, emphasizing problem-solving over syntax. The second involves students crafting prompts to generate code equivalent to provided fragments, to foster an understanding of the relationship between prompts and code. Most of the students in the course had reported finding programming difficult to learn, often citing frustrations with syntax and debugging. We found that self-reported difficulty with learning programming had a strong inverse relationship with performance on traditional programming assessments such as tests and projects, as expected. However, performance on the natural language tasks was less strongly related to self-reported difficulty, suggesting they may target different skills. Learning how to communicate with AI coding models is becoming an important skill, and natural language prompting tasks may appeal to a broad range of students.

* 7 pages, 6 figures. Accepted for publication at SIGCSE Virtual 2024

Via

Access Paper or Ask Questions