Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Mohaisen

Enhancing Vulnerability Reports with Automated and Augmented Description Summarization

Apr 29, 2025

Hattan Althebeiti, Mohammed Alkinoon, Manar Mohaisen, Saeed Salem, DaeHun Nyang, David Mohaisen

Abstract:Public vulnerability databases, such as the National Vulnerability Database (NVD), document vulnerabilities and facilitate threat information sharing. However, they often suffer from short descriptions and outdated or insufficient information. In this paper, we introduce Zad, a system designed to enrich NVD vulnerability descriptions by leveraging external resources. Zad consists of two pipelines: one collects and filters supplementary data using two encoders to build a detailed dataset, while the other fine-tunes a pre-trained model on this dataset to generate enriched descriptions. By addressing brevity and improving content quality, Zad produces more comprehensive and cohesive vulnerability descriptions. We evaluate Zad using standard summarization metrics and human assessments, demonstrating its effectiveness in enhancing vulnerability information.

* 12 pages, 3 tables, 12 figures. Accepted for publication in IEEE Transactions on Big Data. Extended version of arXiv:2210.01260

Via

Access Paper or Ask Questions

Fishing for Phishers: Learning-Based Phishing Detection in Ethereum Transactions

Apr 24, 2025

Ahod Alghuried, Abdulaziz Alghamdi, Ali Alkinoon, Soohyeon Choi, Manar Mohaisen, David Mohaisen

Abstract:Phishing detection on Ethereum has increasingly leveraged advanced machine learning techniques to identify fraudulent transactions. However, limited attention has been given to understanding the effectiveness of feature selection strategies and the role of graph-based models in enhancing detection accuracy. In this paper, we systematically examine these issues by analyzing and contrasting explicit transactional features and implicit graph-based features, both experimentally and analytically. We explore how different feature sets impact the performance of phishing detection models, particularly in the context of Ethereum's transactional network. Additionally, we address key challenges such as class imbalance and dataset composition and their influence on the robustness and precision of detection methods. Our findings demonstrate the advantages and limitations of each feature type, while also providing a clearer understanding of how feature affect model resilience and generalization in adversarial environments.

* 23 pages, 6 tables, 5 figures

Via

Access Paper or Ask Questions

Uncertainty-Guided Coarse-to-Fine Tumor Segmentation with Anatomy-Aware Post-Processing

Apr 16, 2025

Ilkin Sevgi Isler, David Mohaisen, Curtis Lisle, Damla Turgut, Ulas Bagci

Abstract:Reliable tumor segmentation in thoracic computed tomography (CT) remains challenging due to boundary ambiguity, class imbalance, and anatomical variability. We propose an uncertainty-guided, coarse-to-fine segmentation framework that combines full-volume tumor localization with refined region-of-interest (ROI) segmentation, enhanced by anatomically aware post-processing. The first-stage model generates a coarse prediction, followed by anatomically informed filtering based on lung overlap, proximity to lung surfaces, and component size. The resulting ROIs are segmented by a second-stage model trained with uncertainty-aware loss functions to improve accuracy and boundary calibration in ambiguous regions. Experiments on private and public datasets demonstrate improvements in Dice and Hausdorff scores, with fewer false positives and enhanced spatial interpretability. These results highlight the value of combining uncertainty modeling and anatomical priors in cascaded segmentation pipelines for robust and clinically meaningful tumor delineation. On the Orlando dataset, our framework improved Swin UNETR Dice from 0.4690 to 0.6447. Reduction in spurious components was strongly correlated with segmentation gains, underscoring the value of anatomically informed post-processing.

* 6 pages, 2 figures, to appear in IEEE ADSCA 2025

Via

Access Paper or Ask Questions

Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis

Feb 03, 2025

Mohammed Kharma, Soohyeon Choi, Mohammed AlKhanafseh, David Mohaisen

Figure 1 for Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis

Figure 2 for Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis

Figure 3 for Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis

Figure 4 for Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis

Abstract:Artificial Intelligence (AI)-driven code generation tools are increasingly used throughout the software development lifecycle to accelerate coding tasks. However, the security of AI-generated code using Large Language Models (LLMs) remains underexplored, with studies revealing various risks and weaknesses. This paper analyzes the security of code generated by LLMs across different programming languages. We introduce a dataset of 200 tasks grouped into six categories to evaluate the performance of LLMs in generating secure and maintainable code. Our research shows that while LLMs can automate code creation, their security effectiveness varies by language. Many models fail to utilize modern security features in recent compiler and toolkit updates, such as Java 17. Moreover, outdated methods are still commonly used, particularly in C++. This highlights the need for advancing LLMs to enhance security and quality while incorporating emerging best practices in programming languages.

* 12 pages, 10 tables. In submission to IEEE Transactions on Dependable and Secure Computing

Via

Access Paper or Ask Questions

Through the Looking Glass: LLM-Based Analysis of AR/VR Android Applications Privacy Policies

Jan 31, 2025

Abdulaziz Alghamdi, David Mohaisen

Abstract:\begin{abstract} This paper comprehensively analyzes privacy policies in AR/VR applications, leveraging BERT, a state-of-the-art text classification model, to evaluate the clarity and thoroughness of these policies. By comparing the privacy policies of AR/VR applications with those of free and premium websites, this study provides a broad perspective on the current state of privacy practices within the AR/VR industry. Our findings indicate that AR/VR applications generally offer a higher percentage of positive segments than free content but lower than premium websites. The analysis of highlighted segments and words revealed that AR/VR applications strategically emphasize critical privacy practices and key terms. This enhances privacy policies' clarity and effectiveness.

* 7 pages; appeared in ICMLA 2024

Via

Access Paper or Ask Questions

I Can Find You in Seconds! Leveraging Large Language Models for Code Authorship Attribution

Jan 14, 2025

Soohyeon Choi, Yong Kiam Tan, Mark Huasong Meng, Mohamed Ragab, Soumik Mondal, David Mohaisen, Khin Mi Mi Aung

Abstract:Source code authorship attribution is important in software forensics, plagiarism detection, and protecting software patch integrity. Existing techniques often rely on supervised machine learning, which struggles with generalization across different programming languages and coding styles due to the need for large labeled datasets. Inspired by recent advances in natural language authorship analysis using large language models (LLMs), which have shown exceptional performance without task-specific tuning, this paper explores the use of LLMs for source code authorship attribution. We present a comprehensive study demonstrating that state-of-the-art LLMs can successfully attribute source code authorship across different languages. LLMs can determine whether two code snippets are written by the same author with zero-shot prompting, achieving a Matthews Correlation Coefficient (MCC) of 0.78, and can attribute code authorship from a small set of reference code snippets via few-shot learning, achieving MCC of 0.77. Additionally, LLMs show some adversarial robustness against misattribution attacks. Despite these capabilities, we found that naive prompting of LLMs does not scale well with a large number of authors due to input token limitations. To address this, we propose a tournament-style approach for large-scale attribution. Evaluating this approach on datasets of C++ (500 authors, 26,355 samples) and Java (686 authors, 55,267 samples) code from GitHub, we achieve classification accuracy of up to 65% for C++ and 68.7% for Java using only one reference per author. These results open new possibilities for applying LLMs to code authorship attribution in cybersecurity and software engineering.

* 12 pages, 5 figures,

Via

Access Paper or Ask Questions

Simple Perturbations Subvert Ethereum Phishing Transactions Detection: An Empirical Analysis

Aug 06, 2024

Ahod Alghureid, David Mohaisen

Figure 1 for Simple Perturbations Subvert Ethereum Phishing Transactions Detection: An Empirical Analysis

Figure 2 for Simple Perturbations Subvert Ethereum Phishing Transactions Detection: An Empirical Analysis

Figure 3 for Simple Perturbations Subvert Ethereum Phishing Transactions Detection: An Empirical Analysis

Figure 4 for Simple Perturbations Subvert Ethereum Phishing Transactions Detection: An Empirical Analysis

Abstract:This paper explores the vulnerability of machine learning models, specifically Random Forest, Decision Tree, and K-Nearest Neighbors, to very simple single-feature adversarial attacks in the context of Ethereum fraudulent transaction detection. Through comprehensive experimentation, we investigate the impact of various adversarial attack strategies on model performance metrics, such as accuracy, precision, recall, and F1-score. Our findings, highlighting how prone those techniques are to simple attacks, are alarming, and the inconsistency in the attacks' effect on different algorithms promises ways for attack mitigation. We examine the effectiveness of different mitigation strategies, including adversarial training and enhanced feature selection, in enhancing model robustness.

* 12 pages, 1 figure, 5 tables, accepted for presentation at WISA 2024

Via

Access Paper or Ask Questions

Untargeted Code Authorship Evasion with Seq2Seq Transformation

Nov 26, 2023

Soohyeon Choi, Rhongho Jang, DaeHun Nyang, David Mohaisen

Abstract:Code authorship attribution is the problem of identifying authors of programming language codes through the stylistic features in their codes, a topic that recently witnessed significant interest with outstanding performance. In this work, we present SCAE, a code authorship obfuscation technique that leverages a Seq2Seq code transformer called StructCoder. SCAE customizes StructCoder, a system designed initially for function-level code translation from one language to another (e.g., Java to C#), using transfer learning. SCAE improved the efficiency at a slight accuracy degradation compared to existing work. We also reduced the processing time by about 68% while maintaining an 85% transformation success rate and up to 95.77% evasion success rate in the untargeted setting.

* 9 pages, 1 figure, 5 tables

Via

Access Paper or Ask Questions

Burning the Adversarial Bridges: Robust Windows Malware Detection Against Binary-level Mutations

Oct 05, 2023

Ahmed Abusnaina, Yizhen Wang, Sunpreet Arora, Ke Wang, Mihai Christodorescu, David Mohaisen

Figure 1 for Burning the Adversarial Bridges: Robust Windows Malware Detection Against Binary-level Mutations

Figure 2 for Burning the Adversarial Bridges: Robust Windows Malware Detection Against Binary-level Mutations

Figure 3 for Burning the Adversarial Bridges: Robust Windows Malware Detection Against Binary-level Mutations

Figure 4 for Burning the Adversarial Bridges: Robust Windows Malware Detection Against Binary-level Mutations

Abstract:Toward robust malware detection, we explore the attack surface of existing malware detection systems. We conduct root-cause analyses of the practical binary-level black-box adversarial malware examples. Additionally, we uncover the sensitivity of volatile features within the detection engines and exhibit their exploitability. Highlighting volatile information channels within the software, we introduce three software pre-processing steps to eliminate the attack surface, namely, padding removal, software stripping, and inter-section information resetting. Further, to counter the emerging section injection attacks, we propose a graph-based section-dependent information extraction scheme for software representation. The proposed scheme leverages aggregated information within various sections in the software to enable robust malware detection and mitigate adversarial settings. Our experimental results show that traditional malware detection models are ineffective against adversarial threats. However, the attack surface can be largely reduced by eliminating the volatile information. Therefore, we propose simple-yet-effective methods to mitigate the impacts of binary manipulation attacks. Overall, our graph-based malware detection scheme can accurately detect malware with an area under the curve score of 88.32\% and a score of 88.19% under a combination of binary manipulation attacks, exhibiting the efficiency of our proposed scheme.

* 12 pages

Via

Access Paper or Ask Questions

SHIELD: Thwarting Code Authorship Attribution

Apr 26, 2023

Mohammed Abuhamad, Changhun Jung, David Mohaisen, DaeHun Nyang

Abstract:Authorship attribution has become increasingly accurate, posing a serious privacy risk for programmers who wish to remain anonymous. In this paper, we introduce SHIELD to examine the robustness of different code authorship attribution approaches against adversarial code examples. We define four attacks on attribution techniques, which include targeted and non-targeted attacks, and realize them using adversarial code perturbation. We experiment with a dataset of 200 programmers from the Google Code Jam competition to validate our methods targeting six state-of-the-art authorship attribution methods that adopt a variety of techniques for extracting authorship traits from source-code, including RNN, CNN, and code stylometry. Our experiments demonstrate the vulnerability of current authorship attribution methods against adversarial attacks. For the non-targeted attack, our experiments demonstrate the vulnerability of current authorship attribution methods against the attack with an attack success rate exceeds 98.5\% accompanied by a degradation of the identification confidence that exceeds 13\%. For the targeted attacks, we show the possibility of impersonating a programmer using targeted-adversarial perturbations with a success rate ranging from 66\% to 88\% for different authorship attribution techniques under several adversarial scenarios.

* 12 pages, 13 figures

Via

Access Paper or Ask Questions