Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinchen Wang

An LLM-based Agent for Reliable Docker Environment Configuration

Feb 19, 2025

Ruida Hu, Chao Peng, Xinchen Wang, Cuiyun Gao

Abstract:Environment configuration is a critical yet time-consuming step in software development, especially when dealing with unfamiliar code repositories. While Large Language Models (LLMs) demonstrate the potential to accomplish software engineering tasks, existing methods for environment configuration often rely on manual efforts or fragile scripts, leading to inefficiencies and unreliable outcomes. We introduce Repo2Run, the first LLM-based agent designed to fully automate environment configuration and generate executable Dockerfiles for arbitrary Python repositories. We address two major challenges: (1) enabling the LLM agent to configure environments within isolated Docker containers, and (2) ensuring the successful configuration process is recorded and accurately transferred to a Dockerfile without error. To achieve this, we propose atomic configuration synthesis, featuring a dual-environment architecture (internal and external environment) with a rollback mechanism to prevent environment "pollution" from failed commands, guaranteeing atomic execution (execute fully or not at all) and a Dockerfile generator to transfer successful configuration steps into runnable Dockerfiles. We evaluate Repo2Run~on our proposed benchmark of 420 recent Python repositories with unit tests, where it achieves an 86.0% success rate, outperforming the best baseline by 63.9%.

Via

Access Paper or Ask Questions

CodeRepoQA: A Large-scale Benchmark for Software Engineering Question Answering

Dec 19, 2024

Ruida Hu, Chao Peng, Jingyi Ren, Bo Jiang, Xiangxin Meng, Qinyun Wu, Pengfei Gao, Xinchen Wang, Cuiyun Gao

Abstract:In this work, we introduce CodeRepoQA, a large-scale benchmark specifically designed for evaluating repository-level question-answering capabilities in the field of software engineering. CodeRepoQA encompasses five programming languages and covers a wide range of scenarios, enabling comprehensive evaluation of language models. To construct this dataset, we crawl data from 30 well-known repositories in GitHub, the largest platform for hosting and collaborating on code, and carefully filter raw data. In total, CodeRepoQA is a multi-turn question-answering benchmark with 585,687 entries, covering a diverse array of software engineering scenarios, with an average of 6.62 dialogue turns per entry. We evaluate ten popular large language models on our dataset and provide in-depth analysis. We find that LLMs still have limitations in question-answering capabilities in the field of software engineering, and medium-length contexts are more conducive to LLMs' performance. The entire benchmark is publicly available at https://github.com/kinesiatricssxilm14/CodeRepoQA.

Via

Access Paper or Ask Questions

AEGIS: An Agent-based Framework for General Bug Reproduction from Issue Descriptions

Nov 27, 2024

Xinchen Wang, Pengfei Gao, Xiangxin Meng, Chao Peng, Ruida Hu, Yun Lin, Cuiyun Gao

Abstract:In software maintenance, bug reproduction is essential for effective fault localization and repair. Manually writing reproduction scripts is a time-consuming task with high requirements for developers. Hence, automation of bug reproduction has increasingly attracted attention from researchers and practitioners. However, the existing studies on bug reproduction are generally limited to specific bug types such as program crashes, and hard to be applied to general bug reproduction. In this paper, considering the superior performance of agent-based methods in code intelligence tasks, we focus on designing an agent-based framework for the task. Directly employing agents would lead to limited bug reproduction performance, due to entangled subtasks, lengthy retrieved context, and unregulated actions. To mitigate the challenges, we propose an Automated gEneral buG reproductIon Scripts generation framework, named AEGIS, which is the first agent-based framework for the task. AEGIS mainly contains two modules: (1) A concise context construction module, which aims to guide the code agent in extracting structured information from issue descriptions, identifying issue-related code with detailed explanations, and integrating these elements to construct the concise context; (2) A FSM-based multi-feedback optimization module to further regulate the behavior of the code agent within the finite state machine (FSM), ensuring a controlled and efficient script generation process based on multi-dimensional feedback. Extensive experiments on the public benchmark dataset show that AEGIS outperforms the state-of-the-art baseline by 23.0% in F->P metric. In addition, the bug reproduction scripts generated by AEGIS can improve the relative resolved rate of Agentless by 12.5%.

Via

Access Paper or Ask Questions

Pyramid-Driven Alignment: Pyramid Principle Guided Integration of Large Language Models and Knowledge Graphs

Oct 17, 2024

Lei Sun, Xinchen Wang, Youdi Li

Abstract:Large Language Models (LLMs) possess impressive reasoning abilities but are prone to generating incorrect information, often referred to as hallucinations. While incorporating external Knowledge Graphs (KGs) can partially mitigate this issue, existing methods primarily treat KGs as static knowledge repositories, overlooking the critical disparity between KG and LLM knowledge, and failing to fully exploit the reasoning capabilities inherent in KGs. To address these limitations, we propose Pyramid-Driven Alignment (PDA), a novel framework for seamlessly integrating LLMs with KGs. PDA utilizes Pyramid Principle analysis to construct a hierarchical pyramid structure. This structure is designed to reflect the input question and generate more validated deductive knowledge, thereby enhancing the alignment of LLMs and KGs and ensuring more cohesive integration. Furthermore, PDA employs a recursive mechanism to harness the underlying reasoning abilities of KGs, resulting in more accurate knowledge retrieval for question-answering tasks. Our experimental results reveal a substantial performance advantage of PDA over state-of-the-art baselines, with improvements reaching 26.70% and 26.78%.

Via

Access Paper or Ask Questions

MarsCode Agent: AI-native Automated Bug Fixing

Sep 04, 2024

Yizhou Liu, Pengfei Gao, Xinchen Wang, Jie Liu, Yexuan Shi, Zhao Zhang, Chao Peng

Figure 1 for MarsCode Agent: AI-native Automated Bug Fixing

Figure 2 for MarsCode Agent: AI-native Automated Bug Fixing

Figure 3 for MarsCode Agent: AI-native Automated Bug Fixing

Figure 4 for MarsCode Agent: AI-native Automated Bug Fixing

Abstract:Recent advances in large language models (LLMs) have shown significant potential to automate various software development tasks, including code completion, test generation, and bug fixing. However, the application of LLMs for automated bug fixing remains challenging due to the complexity and diversity of real-world software systems. In this paper, we introduce MarsCode Agent, a novel framework that leverages LLMs to automatically identify and repair bugs in software code. MarsCode Agent combines the power of LLMs with advanced code analysis techniques to accurately localize faults and generate patches. Our approach follows a systematic process of planning, bug reproduction, fault localization, candidate patch generation, and validation to ensure high-quality bug fixes. We evaluated MarsCode Agent on SWE-bench, a comprehensive benchmark of real-world software projects, and our results show that MarsCode Agent achieves a high success rate in bug fixing compared to most of the existing automated approaches.

* Yizhou Liu and Pengfei Gao contributed equally and the order is determined by rolling the dice. Chao Peng is the corresponding author

Via

Access Paper or Ask Questions

When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection

Aug 21, 2023

Xin-Cheng Wen, Xinchen Wang, Cuiyun Gao, Shaohua Wang, Yang Liu, Zhaoquan Gu

Figure 1 for When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection

Figure 2 for When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection

Figure 3 for When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection

Figure 4 for When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection

Abstract:Automated code vulnerability detection has gained increasing attention in recent years. The deep learning (DL)-based methods, which implicitly learn vulnerable code patterns, have proven effective in vulnerability detection. The performance of DL-based methods usually relies on the quantity and quality of labeled data. However, the current labeled data are generally automatically collected, such as crawled from human-generated commits, making it hard to ensure the quality of the labels. Prior studies have demonstrated that the non-vulnerable code (i.e., negative labels) tends to be unreliable in commonly-used datasets, while vulnerable code (i.e., positive labels) is more determined. Considering the large numbers of unlabeled data in practice, it is necessary and worth exploring to leverage the positive data and large numbers of unlabeled data for more accurate vulnerability detection. In this paper, we focus on the Positive and Unlabeled (PU) learning problem for vulnerability detection and propose a novel model named PILOT, i.e., PositIve and unlabeled Learning mOdel for vulnerability deTection. PILOT only learns from positive and unlabeled data for vulnerability detection. It mainly contains two modules: (1) A distance-aware label selection module, aiming at generating pseudo-labels for selected unlabeled data, which involves the inter-class distance prototype and progressive fine-tuning; (2) A mixed-supervision representation learning module to further alleviate the influence of noise and enhance the discrimination of representations.

* This paper is accepted by ASE 2023

Via

Access Paper or Ask Questions

TiEV: The Tongji Intelligent Electric Vehicle in the Intelligent Vehicle Future Challenge of China

May 07, 2018

Junqiao Zhao, Chen Ye, Yan Wu, Linting Guan, Lewen Cai, Lu Sun, Tao Yang, Xudong He, Jun Li, Yongchao Ding(+9 more)

Figure 1 for TiEV: The Tongji Intelligent Electric Vehicle in the Intelligent Vehicle Future Challenge of China

Figure 2 for TiEV: The Tongji Intelligent Electric Vehicle in the Intelligent Vehicle Future Challenge of China

Figure 3 for TiEV: The Tongji Intelligent Electric Vehicle in the Intelligent Vehicle Future Challenge of China

Figure 4 for TiEV: The Tongji Intelligent Electric Vehicle in the Intelligent Vehicle Future Challenge of China

Abstract:TiEV is an autonomous driving platform implemented by Tongji University of China. The vehicle is drive-by-wire and is fully powered by electricity. We devised the software system of TiEV from scratch, which is capable of driving the vehicle autonomously in urban paths as well as on fast express roads. We describe our whole system, especially novel modules of probabilistic perception fusion, incremental mapping, the 1st and the 2nd planning and the overall safety concern. TiEV finished 2016 and 2017 Intelligent Vehicle Future Challenge of China held at Changshu. We show our experiences on the development of autonomous vehicles and future trends.

Via

Access Paper or Ask Questions