Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sebastian Baltes

School of Computer Science, University of Adelaide, Australia

Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow

Feb 10, 2021

Kaibo Cao, Chunyang Chen, Sebastian Baltes, Christoph Treude, Xiang Chen

Figure 1 for Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow

Figure 2 for Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow

Figure 3 for Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow

Figure 4 for Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow

Abstract:As a popular Q&A site for programming, Stack Overflow is a treasure for developers. However, the amount of questions and answers on Stack Overflow make it difficult for developers to efficiently locate the information they are looking for. There are two gaps leading to poor search results: the gap between the user's intention and the textual query, and the semantic gap between the query and the post content. Therefore, developers have to constantly reformulate their queries by correcting misspelled words, adding limitations to certain programming languages or platforms, etc. As query reformulation is tedious for developers, especially for novices, we propose an automated software-specific query reformulation approach based on deep learning. With query logs provided by Stack Overflow, we construct a large-scale query reformulation corpus, including the original queries and corresponding reformulated ones. Our approach trains a Transformer model that can automatically generate candidate reformulated queries when given the user's original query. The evaluation results show that our approach outperforms five state-of-the-art baselines, and achieves a 5.6% to 33.5% boost in terms of $\mathit{ExactMatch}$ and a 4.8% to 14.4% boost in terms of $\mathit{GLEU}$.

* 13 pages, 6 figures, accepted in ICSE'21: 43rd IEEE/ACM International Conference on Software Engineering

Via

Access Paper or Ask Questions

An Annotated Dataset of Stack Overflow Post Edits

May 06, 2020

Sebastian Baltes, Markus Wagner

Figure 1 for An Annotated Dataset of Stack Overflow Post Edits

Figure 2 for An Annotated Dataset of Stack Overflow Post Edits

Figure 3 for An Annotated Dataset of Stack Overflow Post Edits

Figure 4 for An Annotated Dataset of Stack Overflow Post Edits

Abstract:To improve software engineering, software repositories have been mined for code snippets and bug fixes. Typically, this mining takes place at the level of files or commits. To be able to dig deeper and to extract insights at a higher resolution, we hereby present an annotated dataset that contains over 7 million edits of code and text on Stack Overflow. Our preliminary study indicates that these edits might be a treasure trove for mining information about fine-grained patches, e.g., for the optimisation of non-functional properties.

Via

Access Paper or Ask Questions