Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Joseph Vargovich

Tag that issue: Applying API-domain labels in issue tracking systems

Apr 06, 2023

Fabio Santos, Joseph Vargovich, Bianca Trinkenreich, Italo Santos, Jacob Penney, Ricardo Britto, João Felipe Pimentel, Igor Wiese, Igor Steinmacher, Anita Sarma(+1 more)

Abstract:Labeling issues with the skills required to complete them can help contributors to choose tasks in Open Source Software projects. However, manually labeling issues is time-consuming and error-prone, and current automated approaches are mostly limited to classifying issues as bugs/non-bugs. We investigate the feasibility and relevance of automatically labeling issues with what we call "API-domains," which are high-level categories of APIs. Therefore, we posit that the APIs used in the source code affected by an issue can be a proxy for the type of skills (e.g., DB, security, UI) needed to work on the issue. We ran a user study (n=74) to assess API-domain labels' relevancy to potential contributors, leveraged the issues' descriptions and the project history to build prediction models, and validated the predictions with contributors (n=20) of the projects. Our results show that (i) newcomers to the project consider API-domain labels useful in choosing tasks, (ii) labels can be predicted with a precision of 84% and a recall of 78.6% on average, (iii) the results of the predictions reached up to 71.3% in precision and 52.5% in recall when training with a project and testing in another (transfer learning), and (iv) project contributors consider most of the predictions helpful in identifying needed skills. These findings suggest our approach can be applied in practice to automatically label issues, assisting developers in finding tasks that better match their skills.

* arXiv admin note: text overlap with arXiv:2103.12653

Via

Access Paper or Ask Questions

GiveMeLabeledIssues: An Open Source Issue Recommendation System

Mar 23, 2023

Joseph Vargovich, Fabio Santos, Jacob Penney, Marco A. Gerosa, Igor Steinmacher

Abstract:Developers often struggle to navigate an Open Source Software (OSS) project's issue-tracking system and find a suitable task. Proper issue labeling can aid task selection, but current tools are limited to classifying the issues according to their type (e.g., bug, question, good first issue, feature, etc.). In contrast, this paper presents a tool (GiveMeLabeledIssues) that mines project repositories and labels issues based on the skills required to solve them. We leverage the domain of the APIs involved in the solution (e.g., User Interface (UI), Test, Databases (DB), etc.) as a proxy for the required skills. GiveMeLabeledIssues facilitates matching developers' skills to tasks, reducing the burden on project maintainers. The tool obtained a precision of 83.9% when predicting the API domains involved in the issues. The replication package contains instructions on executing the tool and including new projects. A demo video is available at https://www.youtube.com/watch?v=ic2quUue7i8

* MSR Data and Tool Showcase 2023

Via

Access Paper or Ask Questions

Linear time dynamic programming for the exact path of optimal models selected from a finite set

Mar 05, 2020

Toby Hocking, Joseph Vargovich

Figure 1 for Linear time dynamic programming for the exact path of optimal models selected from a finite set

Figure 2 for Linear time dynamic programming for the exact path of optimal models selected from a finite set

Figure 3 for Linear time dynamic programming for the exact path of optimal models selected from a finite set

Figure 4 for Linear time dynamic programming for the exact path of optimal models selected from a finite set

Abstract:Many learning algorithms are formulated in terms of finding model parameters which minimize a data-fitting loss function plus a regularizer. When the regularizer involves the l0 pseudo-norm, the resulting regularization path consists of a finite set of models. The fastest existing algorithm for computing the breakpoints in the regularization path is quadratic in the number of models, so it scales poorly to high dimensional problems. We provide new formal proofs that a dynamic programming algorithm can be used to compute the breakpoints in linear time. Empirical results on changepoint detection problems demonstrate the improved accuracy and speed relative to grid search and the previous quadratic time algorithm.

* 14 pages

Via

Access Paper or Ask Questions