Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Prabhat Singh

Cisco

Code-Craft: Hierarchical Graph-Based Code Summarization for Enhanced Context Retrieval

Apr 11, 2025

David Sounthiraraj, Jared Hancock, Yassin Kortam, Ashok Javvaji, Prabhat Singh, Shaila Shankar

Abstract:Understanding and navigating large-scale codebases remains a significant challenge in software engineering. Existing methods often treat code as flat text or focus primarily on local structural relationships, limiting their ability to provide holistic, context-aware information retrieval. We present Hierarchical Code Graph Summarization (HCGS), a novel approach that constructs a multi-layered representation of a codebase by generating structured summaries in a bottom-up fashion from a code graph. HCGS leverages the Language Server Protocol for language-agnostic code analysis and employs a parallel level-based algorithm for efficient summary generation. Through extensive evaluation on five diverse codebases totaling 7,531 functions, HCGS demonstrates significant improvements in code retrieval accuracy, achieving up to 82 percentage relative improvement in top-1 retrieval precision for large codebases like libsignal (27.15 percentage points), and perfect Pass@3 scores for smaller repositories. The system's hierarchical approach consistently outperforms traditional code-only retrieval across all metrics, with particularly substantial gains in larger, more complex codebases where understanding function relationships is crucial.

Via

Access Paper or Ask Questions

LLM Agents Improve Semantic Code Search

Aug 05, 2024

Sarthak Jain, Aditya Dora, Ka Seng Sam, Prabhat Singh

Figure 1 for LLM Agents Improve Semantic Code Search

Figure 2 for LLM Agents Improve Semantic Code Search

Figure 3 for LLM Agents Improve Semantic Code Search

Abstract:Code Search is a key task that many programmers often have to perform while developing solutions to problems. Current methodologies suffer from an inability to perform accurately on prompts that contain some ambiguity or ones that require additional context relative to a code-base. We introduce the approach of using Retrieval Augmented Generation (RAG) powered agents to inject information into user prompts allowing for better inputs into embedding models. By utilizing RAG, agents enhance user queries with relevant details from GitHub repositories, making them more informative and contextually aligned. Additionally, we introduce a multi-stream ensemble approach which when paired with agentic workflow can obtain improved retrieval accuracy, which we deploy on application called repo-rift.com. Experimental results on the CodeSearchNet dataset demonstrate that RepoRift significantly outperforms existing methods, achieving an 78.2% success rate at Success@10 and a 34.6% success rate at Success@1. This research presents a substantial advancement in semantic code search, highlighting the potential of agentic LLMs and RAG to enhance code retrieval systems.

* 12 pages, 1 Figure

Via

Access Paper or Ask Questions

Transforming Software Development: Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects

Jun 25, 2024

Ruchika Pandey, Prabhat Singh, Raymond Wei, Shaila Shankar

Figure 1 for Transforming Software Development: Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects

Figure 2 for Transforming Software Development: Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects

Figure 3 for Transforming Software Development: Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects

Figure 4 for Transforming Software Development: Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects

Abstract:Generative AI technologies promise to transform the product development lifecycle. This study evaluates the efficiency gains, areas for improvement, and emerging challenges of using GitHub Copilot, an AI-powered coding assistant. We identified 15 software development tasks and assessed Copilot's benefits through real-world projects on large proprietary code bases. Our findings indicate significant reductions in developer toil, with up to 50% time saved in code documentation and autocompletion, and 30-40% in repetitive coding tasks, unit test generation, debugging, and pair programming. However, Copilot struggles with complex tasks, large functions, multiple files, and proprietary contexts, particularly with C/C++ code. We project a 33-36% time reduction for coding-related tasks in a cloud-first software development lifecycle. This study aims to quantify productivity improvements, identify underperforming scenarios, examine practical benefits and challenges, investigate performance variations across programming languages, and discuss emerging issues related to code quality, security, and developer experience.

* 13 pages, 8 figures

Via

Access Paper or Ask Questions