Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tarushi Gandhi

System Test Case Design from Requirements Specifications: Insights and Challenges of Using ChatGPT

Dec 04, 2024

Shreya Bhatia, Tarushi Gandhi, Dhruv Kumar, Pankaj Jalote

Abstract:System testing is essential in any software development project to ensure that the final products meet the requirements. Creating comprehensive test cases for system testing from requirements is often challenging and time-consuming. This paper explores the effectiveness of using Large Language Models (LLMs) to generate test case designs from Software Requirements Specification (SRS) documents. In this study, we collected the SRS documents of five software engineering projects containing functional and non-functional requirements, which were implemented, tested, and delivered by respective developer teams. For generating test case designs, we used ChatGPT-4o Turbo model. We employed prompt-chaining, starting with an initial context-setting prompt, followed by prompts to generate test cases for each use case. We assessed the quality of the generated test case designs through feedback from the same developer teams as mentioned above. Our experiments show that about 87 percent of the generated test cases were valid, with the remaining 13 percent either not applicable or redundant. Notably, 15 percent of the valid test cases were previously not considered by developers in their testing. We also tasked ChatGPT with identifying redundant test cases, which were subsequently validated by the respective developers to identify false positives and to uncover any redundant test cases that may have been missed by the developers themselves. This study highlights the potential of leveraging LLMs for test generation from the Requirements Specification document and also for assisting developers in quickly identifying and addressing redundancies, ultimately improving test suite quality and efficiency of the testing procedure.

Via

Access Paper or Ask Questions

Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools

Dec 17, 2023

Shreya Bhatia, Tarushi Gandhi, Dhruv Kumar, Pankaj Jalote

Figure 1 for Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools

Figure 2 for Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools

Figure 3 for Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools

Figure 4 for Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools

Abstract:Generating unit tests is a crucial task in software development, demanding substantial time and effort from programmers. The advent of Large Language Models (LLMs) introduces a novel avenue for unit test script generation. This research aims to experimentally investigate the effectiveness of LLMs, specifically exemplified by ChatGPT, for generating unit test scripts for Python programs, and how the generated test cases compare with those generated by an existing unit test generator (Pynguin). For experiments, we consider three types of code units: 1) Procedural scripts, 2) Function-based modular code, and 3) Class-based code. The generated test cases are evaluated based on criteria such as coverage, correctness, and readability. Our results show that ChatGPT's performance is comparable with Pynguin in terms of coverage. At the same time, ChatGPT's ability to generate tests is superior to Pynguin, as the latter is not able to generate test cases for Category 1. We also find that about 39% and 28% of assertions generated by ChatGPT for Category 2 and 3, respectively, were incorrect. Our results also show that there is minimal overlap in missed statements between ChatGPT and Pynguin, thus, suggesting that a combination of both tools may enhance unit test generation performance. Finally, prompt engineering improved ChatGPT's performance, achieving an average 28% coverage improvement in Category 2 and 15% improvement in Category 3 after about 4 iterations.

* Under Review

Via

Access Paper or Ask Questions