Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yalan Lin

LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues

Nov 21, 2024

Yalan Lin, Yingwei Ma, Rongyu Cao, Binhua Li, Fei Huang, Xiaodong Gu, Yongbin Li

Figure 1 for LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues

Figure 2 for LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues

Figure 3 for LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues

Figure 4 for LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues

Abstract:Reproducing buggy code is the first and crucially important step in issue resolving, as it aids in identifying the underlying problems and validating that generated patches resolve the problem. While numerous approaches have been proposed for this task, they primarily address common, widespread errors and struggle to adapt to unique, evolving errors specific to individual code repositories. To fill this gap, we propose EvoCoder, a multi-agent continuous learning framework for issue code reproduction. EvoCoder adopts a reflection mechanism that allows the LLM to continuously learn from previously resolved problems and dynamically refine its strategies to new emerging challenges. To prevent experience bloating, EvoCoder introduces a novel hierarchical experience pool that enables the model to adaptively update common and repo-specific experiences. Our experimental results show a 20\% improvement in issue reproduction rates over existing SOTA methods. Furthermore, integrating our reproduction mechanism significantly boosts the overall accuracy of the existing issue-resolving pipeline.

Via

Access Paper or Ask Questions

CodeCipher: Learning to Obfuscate Source Code Against LLMs

Oct 08, 2024

Yalan Lin, Chengcheng Wan, Yixiong Fang, Xiaodong Gu

Figure 1 for CodeCipher: Learning to Obfuscate Source Code Against LLMs

Figure 2 for CodeCipher: Learning to Obfuscate Source Code Against LLMs

Figure 3 for CodeCipher: Learning to Obfuscate Source Code Against LLMs

Figure 4 for CodeCipher: Learning to Obfuscate Source Code Against LLMs

Abstract:While large code language models have made significant strides in AI-assisted coding tasks, there are growing concerns about privacy challenges. The user code is transparent to the cloud LLM service provider, inducing risks of unauthorized training, reading, and execution of the user code. In this paper, we propose CodeCipher, a novel method that perturbs privacy from code while preserving the original response from LLMs. CodeCipher transforms the LLM's embedding matrix so that each row corresponds to a different word in the original matrix, forming a token-to-token confusion mapping for obfuscating source code. The new embedding matrix is optimized by minimizing the task-specific loss function. To tackle the challenge of the discrete and sparse nature of word vector spaces, CodeCipher adopts a discrete optimization strategy that aligns the updated vector to the nearest valid token in the vocabulary before each gradient update. We demonstrate the effectiveness of our approach on three AI-assisted coding tasks including code completion, summarization, and translation. Results show that our model successfully confuses the privacy in source code while preserving the original LLM's performance.

Via

Access Paper or Ask Questions