Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Timothy J. Sheldon

An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags

Sep 01, 2021

Christian D. Newman, Michael J. Decker, Reem S. AlSuhaibani, Anthony Peruma, Satyajit Mohapatra, Tejal Vishnoi, Marcos Zampieri, Mohamed W. Mkaouer, Timothy J. Sheldon, Emily Hill

Figure 1 for An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags

Figure 2 for An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags

Figure 3 for An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags

Figure 4 for An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags

Abstract:This paper presents an ensemble part-of-speech tagging approach for source code identifiers. Ensemble tagging is a technique that uses machine-learning and the output from multiple part-of-speech taggers to annotate natural language text at a higher quality than the part-of-speech taggers are able to obtain independently. Our ensemble uses three state-of-the-art part-of-speech taggers: SWUM, POSSE, and Stanford. We study the quality of the ensemble's annotations on five different types of identifier names: function, class, attribute, parameter, and declaration statement at the level of both individual words and full identifier names. We also study and discuss the weaknesses of our tagger to promote the future amelioration of these problems through further research. Our results show that the ensemble achieves 75\% accuracy at the identifier level and 84-86\% accuracy at the word level. This is an increase of +17\% points at the identifier level from the closest independent part-of-speech tagger.

* in IEEE Transactions on Software Engineering, vol. , no. 01, pp. 1-1, 5555
* 18 pages. arXiv admin note: text overlap with arXiv:2007.08033

Via

Access Paper or Ask Questions