Abstract:Documentation debt hinders the effective utilization of open-source software. Although code summarization tools have been helpful for developers, most would prefer a detailed account of each parameter in a function rather than a high-level summary. However, generating such a summary is too intricate for a single generative model to produce reliably due to the lack of high-quality training data. Thus, we propose a multi-step approach that combines multiple task-specific models, each adept at producing a specific section of a docstring. The combination of these models ensures the inclusion of each section in the final docstring. We compared the results from our approach with existing generative models using both automatic metrics and a human-centred evaluation with 17 participating developers, which proves the superiority of our approach over existing methods.
Abstract:Mining Software Repositories (MSRs) is an evidence-based methodology that cross-links data to uncover actionable information about software systems. Empirical studies in software engineering often leverage MSR techniques as they allow researchers to unveil issues and flaws in software development so as to analyse the different factors contributing to them. Hence, counting on fine-grained information about the repositories and sources being mined (e.g., server names, and contributors' identities) is essential for the reproducibility and transparency of MSR studies. However, this can also introduce threats to participants' privacy as their identities may be linked to flawed/sub-optimal programming practices (e.g., code smells, improper documentation), or vice-versa. Moreover, this can be extensible to close collaborators and community members resulting "guilty by association". This position paper aims to start a discussion about indirect participation in MSRs investigations, the dichotomy of 'privacy vs. utility' regarding sharing non-aggregated data, and its effects on privacy restrictions and ethical considerations for participant involvement.