Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kenichi Matsumoto

Uncovering Intention through LLM-Driven Code Snippet Description Generation

Jun 18, 2025

Yusuf Sulistyo Nugroho, Farah Danisha Salam, Brittany Reid, Raula Gaikovina Kula, Kazumasa Shimari, Kenichi Matsumoto

Abstract:Documenting code snippets is essential to pinpoint key areas where both developers and users should pay attention. Examples include usage examples and other Application Programming Interfaces (APIs), which are especially important for third-party libraries. With the rise of Large Language Models (LLMs), the key goal is to investigate the kinds of description developers commonly use and evaluate how well an LLM, in this case Llama, can support description generation. We use NPM Code Snippets, consisting of 185,412 packages with 1,024,579 code snippets. From there, we use 400 code snippets (and their descriptions) as samples. First, our manual classification found that the majority of original descriptions (55.5%) highlight example-based usage. This finding emphasizes the importance of clear documentation, as some descriptions lacked sufficient detail to convey intent. Second, the LLM correctly identified the majority of original descriptions as "Example" (79.75%), which is identical to our manual finding, showing a propensity for generalization. Third, compared to the originals, the produced description had an average similarity score of 0.7173, suggesting relevance but room for improvement. Scores below 0.9 indicate some irrelevance. Our results show that depending on the task of the code snippet, the intention of the document may differ from being instructions for usage, installations, or descriptive learning examples for any user of a library.

* 6 pages, 3 figures, 4 tables, conference paper

Via

Access Paper or Ask Questions

A Simulation Study of Bandit Algorithms to Address External Validity of Software Fault Prediction

Mar 17, 2020

Teruki Hayakawa, Masateru Tsunoda, Koji Toda, Keitaro Nakasai, Kenichi Matsumoto

Figure 1 for A Simulation Study of Bandit Algorithms to Address External Validity of Software Fault Prediction

Figure 2 for A Simulation Study of Bandit Algorithms to Address External Validity of Software Fault Prediction

Figure 3 for A Simulation Study of Bandit Algorithms to Address External Validity of Software Fault Prediction

Figure 4 for A Simulation Study of Bandit Algorithms to Address External Validity of Software Fault Prediction

Abstract:Various software fault prediction models and techniques for building algorithms have been proposed. Many studies have compared and evaluated them to identify the most effective ones. However, in most cases, such models and techniques do not have the best performance on every dataset. This is because there is diversity of software development datasets, and therefore, there is a risk that the selected model or technique shows bad performance on a certain dataset. To avoid selecting a low accuracy model, we apply bandit algorithms to predict faults. Consider a case where player has 100 coins to bet on several slot machines. Ordinary usage of software fault prediction is analogous to the player betting all 100 coins in one slot machine. In contrast, bandit algorithms bet one coin on each machine (i.e., use prediction models) step-by-step to seek the best machine. In the experiment, we developed an artificial dataset that includes 100 modules, 15 of which include faults. Then, we developed various artificial fault prediction models and selected them dynamically using bandit algorithms. The Thomson sampling algorithm showed the best or second-best prediction performance compared with using only one prediction model.

* 5 pages

Via

Access Paper or Ask Questions

Towards Generation of Visual Attention Map for Source Code

Aug 13, 2019

Takeshi D. Itoh, Takatomi Kubo, Kiyoka Ikeda, Yuki Maruno, Yoshiharu Ikutani, Hideaki Hata, Kenichi Matsumoto, Kazushi Ikeda

Figure 1 for Towards Generation of Visual Attention Map for Source Code

Figure 2 for Towards Generation of Visual Attention Map for Source Code

Figure 3 for Towards Generation of Visual Attention Map for Source Code

Abstract:Program comprehension is a dominant process in software development and maintenance. Experts are considered to comprehend the source code efficiently by directing their gaze, or attention, to important components in it. However, reflecting the importance of components is still a remaining issue in gaze behavior analysis for source code comprehension. Here we show a conceptual framework to compare the quantified importance of source code components with the gaze behavior of programmers. We use "attention" in attention models (e.g., code2vec) as the importance indices for source code components and evaluate programmers' gaze locations based on the quantified importance. In this report, we introduce the idea of our gaze behavior analysis using the attention map, and the results of a preliminary experiment.

* 4 pages, 2 figures; APSIPA 2019 ACCEPTED

Via

Access Paper or Ask Questions

Sentiment Classification using N-gram IDF and Automated Machine Learning

May 25, 2019

Rungroj Maipradit, Hideaki Hata, Kenichi Matsumoto

Figure 1 for Sentiment Classification using N-gram IDF and Automated Machine Learning

Figure 2 for Sentiment Classification using N-gram IDF and Automated Machine Learning

Figure 3 for Sentiment Classification using N-gram IDF and Automated Machine Learning

Abstract:We propose a sentiment classification method with a general machine learning framework. For feature representation, n-gram IDF is used to extract software-engineering-related, dataset-specific, positive, neutral, and negative n-gram expressions. For classifiers, an automated machine learning tool is used. In the comparison using publicly available datasets, our method achieved the highest F1 values in positive and negative sentences on all datasets.

* 4 pages, IEEE Software

Via

Access Paper or Ask Questions

Toward Imitating Visual Attention of Experts in Software Development Tasks

Mar 15, 2019

Yoshiharu Ikutani, Nishanth Koganti, Hideaki Hata, Takatomi Kubo, Kenichi Matsumoto

Figure 1 for Toward Imitating Visual Attention of Experts in Software Development Tasks

Figure 2 for Toward Imitating Visual Attention of Experts in Software Development Tasks

Abstract:Expert programmers' eye-movements during source code reading are valuable sources that are considered to be associated with their domain expertise. We advocate a vision of new intelligent systems incorporating expertise of experts for software development tasks, such as issue localization, comment generation, and code generation. We present a conceptual framework of neural autonomous agents based on imitation learning (IL), which enables agents to mimic the visual attention of an expert via his/her eye movement. In this framework, an autonomous agent is constructed as a context-based attention model that consists of encoder/decoder network and trained with state-action sequences generated by an experts' demonstration. Challenges to implement an IL-based autonomous agent specialized for software development task are discussed in this paper.

* 4 pages, EMIP 2019

Via

Access Paper or Ask Questions