Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Melih Sirlanci

C2RUST-BENCH: A Minimized, Representative Dataset for C-to-Rust Transpilation Evaluation

Apr 21, 2025

Melih Sirlanci, Carter Yagemann, Zhiqiang Lin

Abstract:Despite the effort in vulnerability detection over the last two decades, memory safety vulnerabilities continue to be a critical problem. Recent reports suggest that the key solution is to migrate to memory-safe languages. To this end, C-to-Rust transpilation becomes popular to resolve memory-safety issues in C programs. Recent works propose C-to-Rust transpilation frameworks; however, a comprehensive evaluation dataset is missing. Although one solution is to put together a large enough dataset, this increases the analysis time in automated frameworks as well as in manual efforts for some cases. In this work, we build a method to select functions from a large set to construct a minimized yet representative dataset to evaluate the C-to-Rust transpilation. We propose C2RUST-BENCH that contains 2,905 functions, which are representative of C-to-Rust transpilation, selected from 15,503 functions of real-world programs.

Via

Access Paper or Ask Questions

Malicious Code Detection: Run Trace Output Analysis by LSTM

Jan 14, 2021

Cengiz Acarturk, Melih Sirlanci, Pinar Gurkan Balikcioglu, Deniz Demirci, Nazenin Sahin, Ozge Acar Kucuk

Figure 1 for Malicious Code Detection: Run Trace Output Analysis by LSTM

Figure 2 for Malicious Code Detection: Run Trace Output Analysis by LSTM

Figure 3 for Malicious Code Detection: Run Trace Output Analysis by LSTM

Figure 4 for Malicious Code Detection: Run Trace Output Analysis by LSTM

Abstract:Malicious software threats and their detection have been gaining importance as a subdomain of information security due to the expansion of ICT applications in daily settings. A major challenge in designing and developing anti-malware systems is the coverage of the detection, particularly the development of dynamic analysis methods that can detect polymorphic and metamorphic malware efficiently. In the present study, we propose a methodological framework for detecting malicious code by analyzing run trace outputs by Long Short-Term Memory (LSTM). We developed models of run traces of malicious and benign Portable Executable (PE) files. We created our dataset from run trace outputs obtained from dynamic analysis of PE files. The obtained dataset was in the instruction format as a sequence and was called Instruction as a Sequence Model (ISM). By splitting the first dataset into basic blocks, we obtained the second one called Basic Block as a Sequence Model (BSM). The experiments showed that the ISM achieved an accuracy of 87.51% and a false positive rate of 18.34%, while BSM achieved an accuracy of 99.26% and a false positive rate of 2.62%.

* 11 pages, 5 figures, 5 tables, accepted to IEEE Access

Via

Access Paper or Ask Questions