Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CodeCoT and Beyond: Learning to Program and Test like a Developer

Aug 17, 2023

Dong Huang, Qingwen Bu, Heming Cui

Figure 1 for CodeCoT and Beyond: Learning to Program and Test like a Developer

Figure 2 for CodeCoT and Beyond: Learning to Program and Test like a Developer

Figure 3 for CodeCoT and Beyond: Learning to Program and Test like a Developer

Figure 4 for CodeCoT and Beyond: Learning to Program and Test like a Developer

Share this with someone who'll enjoy it:

Abstract:In natural language processing, transformer-based large language models (LLMs) like GPT-x models developed by OpenAI have revolutionized the landscape. Despite their impressive capabilities, these models often encounter challenges when handling tasks that differ from their training data, resulting in compromised performance. To address this, few-shot learning has emerged as a valuable technique, allowing LLMs to adapt with minimal task-specific data. One innovative strategy, known as Chain-of-Thought Prompting (CoT), has been introduced to guide LLMs in revealing cognitive processes during multi-step reasoning. In this paper, we propose Code Chain-of-Thought~(CodeCoT), which consists of two components: the Vanilla CodeCoT and the Self-exam CodeCoT. The latter incorporates self-examination, empowering the model to iteratively generate code, formulate test cases, and refine its outputs. Specifically, the process entails the generation of test examples by the model corresponding to the code it is tasked to implement. If it fails on the test examples, then it regenerates the code based on the erroneous code and associated error types. Through comprehensive experiments, we observed that both techniques significantly enhance code generation accuracy across various LLM variants. Our evaluation results reveal that CodeCoT improves the code generation effectiveness, including an unprecedented pass@1 accuracy of 79.27\% using the Self-exam CodeCoT approach on the gpt-3.5-turbo-0613 model in the HumanEval dataset.

View paper on

Share this with someone who'll enjoy it:

Title:CodeCoT and Beyond: Learning to Program and Test like a Developer

Paper and Code