Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vladimir Kovalenko

PSIMiner: A Tool for Mining Rich Abstract Syntax Trees from Code

Mar 23, 2021

Egor Spirin, Egor Bogomolov, Vladimir Kovalenko, Timofey Bryksin

Figure 1 for PSIMiner: A Tool for Mining Rich Abstract Syntax Trees from Code

Figure 2 for PSIMiner: A Tool for Mining Rich Abstract Syntax Trees from Code

Figure 3 for PSIMiner: A Tool for Mining Rich Abstract Syntax Trees from Code

Abstract:The application of machine learning algorithms to source code has grown in the past years. Since these algorithms are quite sensitive to input data, it is not surprising that researchers experiment with input representations. Nowadays, a popular starting point to represent code is abstract syntax trees (ASTs). Abstract syntax trees have been used for a long time in various software engineering domains, and in particular in IDEs. The API of modern IDEs allows to manipulate and traverse ASTs, resolve references between code elements, etc. Such algorithms can enrich ASTs with new data and therefore may be useful in ML-based code analysis. In this work, we present PSIMiner - a tool for processing PSI trees from the IntelliJ Platform. PSI trees contain code syntax trees as well as functions to work with them, and therefore can be used to enrich code representation using static analysis algorithms of modern IDEs. To showcase this idea, we use our tool to infer types of identifiers in Java ASTs and extend the code2seq model for the method name prediction problem.

* 5 pages, 2 figures

Via

Access Paper or Ask Questions

Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler

Apr 03, 2020

Timofey Bryksin, Victor Petukhov, Ilya Alexin, Stanislav Prikhodko, Alexey Shpilman, Vladimir Kovalenko, Nikita Povarov

Figure 1 for Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler

Figure 2 for Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler

Figure 3 for Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler

Figure 4 for Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler

Abstract:In this work, we apply anomaly detection to source code and bytecode to facilitate the development of a programming language and its compiler. We define anomaly as a code fragment that is different from typical code written in a particular programming language. Identifying such code fragments is beneficial to both language developers and end users, since anomalies may indicate potential issues with the compiler or with runtime performance. Moreover, anomalies could correspond to problems in language design. For this study, we choose Kotlin as the target programming language. We outline and discuss approaches to obtaining vector representations of source code and bytecode and to the detection of anomalies across vectorized code snippets. The paper presents a method that aims to detect two types of anomalies: syntax tree anomalies and so-called compiler-induced anomalies that arise only in the compiled bytecode. We describe several experiments that employ different combinations of vectorization and anomaly detection techniques and discuss types of detected anomalies and their usefulness for language developers. We demonstrate that the extracted anomalies and the underlying extraction technique provide additional value for language development.

Via

Access Paper or Ask Questions

Classifiers for centrality determination in proton-nucleus and nucleus-nucleus collisions

Nov 30, 2016

Igor Altsybeev, Vladimir Kovalenko

Figure 1 for Classifiers for centrality determination in proton-nucleus and nucleus-nucleus collisions

Figure 2 for Classifiers for centrality determination in proton-nucleus and nucleus-nucleus collisions

Figure 3 for Classifiers for centrality determination in proton-nucleus and nucleus-nucleus collisions

Figure 4 for Classifiers for centrality determination in proton-nucleus and nucleus-nucleus collisions

Abstract:Centrality, as a geometrical property of the collision, is crucial for the physical interpretation of nucleus-nucleus and proton-nucleus experimental data. However, it cannot be directly accessed in event-by-event data analysis. Common methods for centrality estimation in A-A and p-A collisions usually rely on a single detector (either on the signal in zero-degree calorimeters or on the multiplicity in some semi-central rapidity range). In the present work, we made an attempt to develop an approach for centrality determination that is based on machine-learning techniques and utilizes information from several detector subsystems simultaneously. Different event classifiers are suggested and evaluated for their selectivity power in terms of the number of nucleons-participants and the impact parameter of the collision. Finer centrality resolution may allow to reduce impact from so-called volume fluctuations on physical observables being studied in heavy-ion experiments like ALICE at the LHC and fixed target experiment NA61/SHINE on SPS.

* EPJ Web of Conferences 137, 11001 (2017)
* To be published in proceedings of the "XIIth Quark Confinement and the Hadron Spectrum" conference (Thessaloniki, 2016)

Via

Access Paper or Ask Questions