Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abhik Roychoudhury

Unified Software Engineering agent as AI Software Engineer

Jun 17, 2025

Leonhard Applis, Yuntong Zhang, Shanchao Liang, Nan Jiang, Lin Tan, Abhik Roychoudhury

Abstract:The growth of Large Language Model (LLM) technology has raised expectations for automated coding. However, software engineering is more than coding and is concerned with activities including maintenance and evolution of a project. In this context, the concept of LLM agents has gained traction, which utilize LLMs as reasoning engines to invoke external tools autonomously. But is an LLM agent the same as an AI software engineer? In this paper, we seek to understand this question by developing a Unified Software Engineering agent or USEagent. Unlike existing work which builds specialized agents for specific software tasks such as testing, debugging, and repair, our goal is to build a unified agent which can orchestrate and handle multiple capabilities. This gives the agent the promise of handling complex scenarios in software development such as fixing an incomplete patch, adding new features, or taking over code written by others. We envision USEagent as the first draft of a future AI Software Engineer which can be a team member in future software development teams involving both AI and humans. To evaluate the efficacy of USEagent, we build a Unified Software Engineering bench (USEbench) comprising of myriad tasks such as coding, testing, and patching. USEbench is a judicious mixture of tasks from existing benchmarks such as SWE-bench, SWT-bench, and REPOCOD. In an evaluation on USEbench consisting of 1,271 repository-level software engineering tasks, USEagent shows improved efficacy compared to existing general agents such as OpenHands CodeActAgent. There exist gaps in the capabilities of USEagent for certain coding tasks, which provides hints on further developing the AI Software Engineer of the future.

* Leonhard Applis and Yuntong Zhang contributed equally to this work

Via

Access Paper or Ask Questions

Will AI replace Software Engineers? Hold your Breath

Feb 27, 2025

Abhik Roychoudhury, Andreas Zeller

Abstract:Artificial Intelligence (AI) technology such as Large Language Models (LLMs) have become extremely popular in creating code. This has led to the conjecture that future software jobs will be exclusively conducted by LLMs, and the software industry will cease to exist. But software engineering is much more than producing code -- notably, \emph{maintaining} large software and keeping it reliable is a major part of software engineering, which LLMs are not yet capable of.

* 3 pages

Via

Access Paper or Ask Questions

AI Software Engineer: Programming with Trust

Feb 19, 2025

Abhik Roychoudhury, Corina Pasareanu, Michael Pradel, Baishakhi Ray

Abstract:Large Language Models (LLMs) have shown surprising proficiency in generating code snippets, promising to automate large parts of software engineering via artificial intelligence (AI). We argue that successfully deploying AI software engineers requires a level of trust equal to or even greater than the trust established by human-driven software engineering practices. The recent trend toward LLM agents offers a path toward integrating the power of LLMs to create new code with the power of analysis tools to increase trust in the code. This opinion piece comments on whether LLM agents could dominate software engineering workflows in the future and whether the focus of programming will shift from programming at scale to programming with trust.

* 5 pages

Via

Access Paper or Ask Questions

Assured Automatic Programming via Large Language Models

Oct 24, 2024

Martin Mirchev, Andreea Costea, Abhishek Kr Singh, Abhik Roychoudhury

Figure 1 for Assured Automatic Programming via Large Language Models

Figure 2 for Assured Automatic Programming via Large Language Models

Figure 3 for Assured Automatic Programming via Large Language Models

Figure 4 for Assured Automatic Programming via Large Language Models

Abstract:With the advent of AI-based coding engines, it is possible to convert natural language requirements to executable code in standard programming languages. However, AI-generated code can be unreliable, and the natural language requirements driving this code may be ambiguous. In other words, the intent may not be accurately captured in the code generated from AI-coding engines like Copilot. The goal of our work is to discover the programmer intent, while generating code which conforms to the intent and a proof of this conformance. Our approach to intent discovery is powered by a novel repair engine called program-proof co-evolution, where the object of repair is a tuple (code, logical specification, test) generated by an LLM from the same natural language description. The program and the specification capture the initial operational and declarative description of intent, while the test represents a concrete, albeit partial, understanding of the intent. Our objective is to achieve consistency between the program, the specification, and the test by incrementally refining our understanding of the user intent. Reaching consistency through this repair process provides us with a formal, logical description of the intent, which is then translated back into natural language for the developer's inspection. The resultant intent description is now unambiguous, though expressed in natural language. We demonstrate how the unambiguous intent discovered through our approach increases the percentage of verifiable auto-generated programs on a recently proposed dataset in the Dafny programming language.

Via

Access Paper or Ask Questions

SpecRover: Code Intent Extraction via LLMs

Aug 07, 2024

Haifeng Ruan, Yuntong Zhang, Abhik Roychoudhury

Figure 1 for SpecRover: Code Intent Extraction via LLMs

Figure 2 for SpecRover: Code Intent Extraction via LLMs

Figure 3 for SpecRover: Code Intent Extraction via LLMs

Figure 4 for SpecRover: Code Intent Extraction via LLMs

Abstract:Autonomous program improvement typically involves automatically producing bug fixes and feature additions. Such program improvement can be accomplished by a combination of large language model (LLM) and program analysis capabilities, in the form of an LLM agent. Since program repair or program improvement typically requires a specification of intended behavior - specification inference can be useful for producing high quality program patches. In this work, we examine efficient and low-cost workflows for iterative specification inference within an LLM agent. Given a GitHub issue to be resolved in a software project, our goal is to conduct iterative code search accompanied by specification inference - thereby inferring intent from both the project structure and behavior. The intent thus captured is examined by a reviewer agent with the goal of vetting the patches as well as providing a measure of confidence in the vetted patches. Our approach SpecRover (AutoCodeRover-v2) is built on the open-source LLM agent AutoCodeRover. In an evaluation on the full SWE-Bench consisting of 2294 GitHub issues, it shows more than 50% improvement in efficacy over AutoCodeRover. Compared to the open-source agents available, our work shows modest cost ($0.65 per issue) in resolving an average GitHub issue in SWE-Bench lite. The production of explanation by SpecRover allows for a better "signal" to be given to the developer, on when the suggested patches can be accepted with confidence. SpecRover also seeks to demonstrate the continued importance of specification inference in automated program repair, even as program repair technologies enter the LLM era.

* Haifeng Ruan and Yuntong Zhang contributed equally to this work

Via

Access Paper or Ask Questions

Automatic Programming: Large Language Models and Beyond

May 03, 2024

Michael R. Lyu, Baishakhi Ray, Abhik Roychoudhury, Shin Hwei Tan, Patanamon Thongtanunam

Figure 1 for Automatic Programming: Large Language Models and Beyond

Figure 2 for Automatic Programming: Large Language Models and Beyond

Figure 3 for Automatic Programming: Large Language Models and Beyond

Figure 4 for Automatic Programming: Large Language Models and Beyond

Abstract:Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related issues of programmer responsibility. These are key issues for organizations while deciding on the usage of automatically generated code. We discuss how advances in software engineering such as program repair and analysis can enable automatic programming. We conclude with a forward looking view, focusing on the programming environment of the near future, where programmers may need to switch to different roles to fully utilize the power of automatic programming. Automated repair of automatically generated programs from LLMs, can help produce higher assurance code from LLMs, along with evidence of assurance

Via

Access Paper or Ask Questions

AutoCodeRover: Autonomous Program Improvement

Apr 15, 2024

Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, Abhik Roychoudhury

Abstract:Researchers have made significant progress in automating the software development process in the past decades. Recent progress in Large Language Models (LLMs) has significantly impacted the development process, where developers can use LLM-based programming assistants to achieve automated coding. Nevertheless software engineering involves the process of program improvement apart from coding, specifically to enable software maintenance (e.g. bug fixing) and software evolution (e.g. feature additions). In this paper, we propose an automated approach for solving GitHub issues to autonomously achieve program improvement. In our approach called AutoCodeRover, LLMs are combined with sophisticated code search capabilities, ultimately leading to a program modification or patch. In contrast to recent LLM agent approaches from AI researchers and practitioners, our outlook is more software engineering oriented. We work on a program representation (abstract syntax tree) as opposed to viewing a software project as a mere collection of files. Our code search exploits the program structure in the form of classes/methods to enhance LLM's understanding of the issue's root cause, and effectively retrieve a context via iterative search. The use of spectrum based fault localization using tests, further sharpens the context, as long as a test-suite is available. Experiments on SWE-bench-lite which consists of 300 real-life GitHub issues show increased efficacy in solving GitHub issues (22-23% on SWE-bench-lite). On the full SWE-bench consisting of 2294 GitHub issues, AutoCodeRover solved around 16% of issues, which is higher than the efficacy of the recently reported AI software engineer Devin from Cognition Labs, while taking time comparable to Devin. We posit that our workflow enables autonomous software engineering, where, in future, auto-generated code from LLMs can be autonomously improved.

Via

Access Paper or Ask Questions

Fairness-guided SMT-based Rectification of Decision Trees and Random Forests

Nov 22, 2020

Jiang Zhang, Ivan Beschastnikh, Sergey Mechtaev, Abhik Roychoudhury

Figure 1 for Fairness-guided SMT-based Rectification of Decision Trees and Random Forests

Figure 2 for Fairness-guided SMT-based Rectification of Decision Trees and Random Forests

Figure 3 for Fairness-guided SMT-based Rectification of Decision Trees and Random Forests

Figure 4 for Fairness-guided SMT-based Rectification of Decision Trees and Random Forests

Abstract:Data-driven decision making is gaining prominence with the popularity of various machine learning models. Unfortunately, real-life data used in machine learning training may capture human biases, and as a result the learned models may lead to unfair decision making. In this paper, we provide a solution to this problem for decision trees and random forests. Our approach converts any decision tree or random forest into a fair one with respect to a specific data set, fairness criteria, and sensitive attributes. The \emph{FairRepair} tool, built based on our approach, is inspired by automated program repair techniques for traditional programs. It uses an SMT solver to decide which paths in the decision tree could have their outcomes flipped to improve the fairness of the model. Our experiments on the well-known adult dataset from UC Irvine demonstrate that FairRepair scales to realistic decision trees and random forests. Furthermore, FairRepair provides formal guarantees about soundness and completeness of finding a repair. Since our fairness-guided repair technique repairs decision trees and random forests obtained from a given (unfair) data-set, it can help to identify and rectify biases in decision-making in an organisation.

Via

Access Paper or Ask Questions

Synthesizing Tasks for Block-based Programming

Jul 01, 2020

Umair Z. Ahmed, Maria Christakis, Aleksandr Efremov, Nigel Fernandez, Ahana Ghosh, Abhik Roychoudhury, Adish Singla

Figure 1 for Synthesizing Tasks for Block-based Programming

Figure 2 for Synthesizing Tasks for Block-based Programming

Figure 3 for Synthesizing Tasks for Block-based Programming

Figure 4 for Synthesizing Tasks for Block-based Programming

Abstract:Block-based visual programming environments play a critical role in introducing computing concepts to K-12 students. One of the key pedagogical challenges in these environments is in designing new practice tasks for a student that match a desired level of difficulty and exercise specific programming concepts. In this paper, we formalize the problem of synthesizing visual programming tasks. In particular, given a reference visual task $\rm T^{in}$ and its solution code $\rm C^{in}$, we propose a novel methodology to automatically generate a set $\{(\rm T^{out}, \rm C^{out})\}$ of new tasks along with solution codes such that tasks $\rm T^{in}$ and $\rm T^{out}$ are conceptually similar but visually dissimilar. Our methodology is based on the realization that the mapping from the space of visual tasks to their solution codes is highly discontinuous; hence, directly mutating reference task $\rm T^{in}$ to generate new tasks is futile. Our task synthesis algorithm operates by first mutating code $\rm C^{in}$ to obtain a set of codes $\{\rm C^{out}\}$. Then, the algorithm performs symbolic execution over a code $\rm C^{out}$ to obtain a visual task $\rm T^{out}$; this step uses the Monte Carlo Tree Search (MCTS) procedure to guide the search in the symbolic tree. We demonstrate the effectiveness of our algorithm through an extensive empirical evaluation and user study on reference tasks taken from the \emph{Hour of the Code: Classic Maze} challenge by \emph{Code.org} and the \emph{Intro to Programming with Karel} course by \emph{CodeHS.com}.

* longer version

Via

Access Paper or Ask Questions