Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

John D. Hastings

An Ethically Grounded LLM-Based Approach to Insider Threat Synthesis and Detection

Sep 08, 2025

Haywood Gelman, John D. Hastings, David Kenley

Abstract:Insider threats are a growing organizational problem due to the complexity of identifying their technical and behavioral elements. A large research body is dedicated to the study of insider threats from technological, psychological, and educational perspectives. However, research in this domain has been generally dependent on datasets that are static and limited access which restricts the development of adaptive detection models. This study introduces a novel, ethically grounded approach that uses the large language model (LLM) Claude Sonnet 3.7 to dynamically synthesize syslog messages, some of which contain indicators of insider threat scenarios. The messages reflect real-world data distributions by being highly imbalanced (1% insider threats). The syslogs were analyzed for insider threats by both Claude Sonnet 3.7 and GPT-4o, with their performance evaluated through statistical metrics including precision, recall, MCC, and ROC AUC. Sonnet 3.7 consistently outperformed GPT-4o across nearly all metrics, particularly in reducing false alarms and improving detection accuracy. The results show strong promise for the use of LLMs in synthetic dataset generation and insider threat detection.

* 6 pages, 5 figures, 5 tables

Via

Access Paper or Ask Questions

Scalable and Ethical Insider Threat Detection through Data Synthesis and Analysis by LLMs

Feb 10, 2025

Haywood Gelman, John D. Hastings

Figure 1 for Scalable and Ethical Insider Threat Detection through Data Synthesis and Analysis by LLMs

Figure 2 for Scalable and Ethical Insider Threat Detection through Data Synthesis and Analysis by LLMs

Figure 3 for Scalable and Ethical Insider Threat Detection through Data Synthesis and Analysis by LLMs

Figure 4 for Scalable and Ethical Insider Threat Detection through Data Synthesis and Analysis by LLMs

Abstract:Insider threats wield an outsized influence on organizations, disproportionate to their small numbers. This is due to the internal access insiders have to systems, information, and infrastructure. %One example of this influence is where anonymous respondents submit web-based job search site reviews, an insider threat risk to organizations. Signals for such risks may be found in anonymous submissions to public web-based job search site reviews. This research studies the potential for large language models (LLMs) to analyze and detect insider threat sentiment within job site reviews. Addressing ethical data collection concerns, this research utilizes synthetic data generation using LLMs alongside existing job review datasets. A comparative analysis of sentiment scores generated by LLMs is benchmarked against expert human scoring. Findings reveal that LLMs demonstrate alignment with human evaluations in most cases, thus effectively identifying nuanced indicators of threat sentiment. The performance is lower on human-generated data than synthetic data, suggesting areas for improvement in evaluating real-world data. Text diversity analysis found differences between human-generated and LLM-generated datasets, with synthetic data exhibiting somewhat lower diversity. Overall, the results demonstrate the applicability of LLMs to insider threat detection, and a scalable solution for insider sentiment testing by overcoming ethical and logistical barriers tied to data acquisition.

* 6 pages, 0 figures, 8 tables

Via

Access Paper or Ask Questions

Toward an Insider Threat Education Platform: A Theoretical Literature Review

Dec 18, 2024

Haywood Gelman, John D. Hastings, David Kenley, Eleanor Loiacono

Abstract:Insider threats (InTs) within organizations are small in number but have a disproportionate ability to damage systems, information, and infrastructure. Existing InT research studies the problem from psychological, technical, and educational perspectives. Proposed theories include research on psychological indicators, machine learning, user behavioral log analysis, and educational methods to teach employees recognition and mitigation techniques. Because InTs are a human problem, training methods that address InT detection from a behavioral perspective are critical. While numerous technological and psychological theories exist on detection, prevention, and mitigation, few training methods prioritize psychological indicators. This literature review studied peer-reviewed, InT research organized by subtopic and extracted critical theories from psychological, technical, and educational disciplines. In doing so, this is the first study to comprehensively organize research across all three approaches in a manner which properly informs the development of an InT education platform.

* 6 pages

Via

Access Paper or Ask Questions

Impact of Data Snooping on Deep Learning Models for Locating Vulnerabilities in Lifted Code

Dec 03, 2024

Gary A. McCully, John D. Hastings, Shengjie Xu

Figure 1 for Impact of Data Snooping on Deep Learning Models for Locating Vulnerabilities in Lifted Code

Figure 2 for Impact of Data Snooping on Deep Learning Models for Locating Vulnerabilities in Lifted Code

Figure 3 for Impact of Data Snooping on Deep Learning Models for Locating Vulnerabilities in Lifted Code

Figure 4 for Impact of Data Snooping on Deep Learning Models for Locating Vulnerabilities in Lifted Code

Abstract:This study examines the impact of data snooping on neural networks for vulnerability detection in lifted code, building on previous research which used word2vec, and unidirectional and bidirectional transformer-based embeddings. The research specifically focuses on how model performance is affected when embedding models are trained on datasets, including samples also used for neural network training and validation. The results show that introducing data snooping did not significantly alter model performance, suggesting that data snooping had a minimal impact or that samples randomly dropped as part of the methodology contained hidden features critical to achieving optimal performance. In addition, the findings reinforce the conclusions of previous research, which found that models trained with GPT-2 embeddings consistently outperformed neural networks trained with other embeddings. The fact that this holds even when data snooping is introduced into the embedding model indicates GPT-2's robustness in representing complex code features, even under less-than-ideal conditions.

* 7 pages, 2 figures

Via

Access Paper or Ask Questions

Utilizing Large Language Models to Synthesize Product Desirability Datasets

Nov 20, 2024

John D. Hastings, Sherri Weitl-Harms, Joseph Doty, Zachary L. Myers, Warren Thompson

Figure 1 for Utilizing Large Language Models to Synthesize Product Desirability Datasets

Figure 2 for Utilizing Large Language Models to Synthesize Product Desirability Datasets

Figure 3 for Utilizing Large Language Models to Synthesize Product Desirability Datasets

Figure 4 for Utilizing Large Language Models to Synthesize Product Desirability Datasets

Abstract:This research explores the application of large language models (LLMs) to generate synthetic datasets for Product Desirability Toolkit (PDT) testing, a key component in evaluating user sentiment and product experience. Utilizing gpt-4o-mini, a cost-effective alternative to larger commercial LLMs, three methods, Word+Review, Review+Word, and Supply-Word, were each used to synthesize 1000 product reviews. The generated datasets were assessed for sentiment alignment, textual diversity, and data generation cost. Results demonstrated high sentiment alignment across all methods, with Pearson correlations ranging from 0.93 to 0.97. Supply-Word exhibited the highest diversity and coverage of PDT terms, although with increased generation costs. Despite minor biases toward positive sentiments, in situations with limited test data, LLM-generated synthetic data offers significant advantages, including scalability, cost savings, and flexibility in dataset production.

* 9 pages, 2 figures, 6 tables

Via

Access Paper or Ask Questions

Comparing Unidirectional, Bidirectional, and Word2vec Models for Discovering Vulnerabilities in Compiled Lifted Code

Sep 26, 2024

Gary A. McCully, John D. Hastings, Shengjie Xu, Adam Fortier

Figure 1 for Comparing Unidirectional, Bidirectional, and Word2vec Models for Discovering Vulnerabilities in Compiled Lifted Code

Figure 2 for Comparing Unidirectional, Bidirectional, and Word2vec Models for Discovering Vulnerabilities in Compiled Lifted Code

Figure 3 for Comparing Unidirectional, Bidirectional, and Word2vec Models for Discovering Vulnerabilities in Compiled Lifted Code

Figure 4 for Comparing Unidirectional, Bidirectional, and Word2vec Models for Discovering Vulnerabilities in Compiled Lifted Code

Abstract:Ransomware and other forms of malware cause significant financial and operational damage to organizations by exploiting long-standing and often difficult-to-detect software vulnerabilities. To detect vulnerabilities such as buffer overflows in compiled code, this research investigates the application of unidirectional transformer-based embeddings, specifically GPT-2. Using a dataset of LLVM functions, we trained a GPT-2 model to generate embeddings, which were subsequently used to build LSTM neural networks to differentiate between vulnerable and non-vulnerable code. Our study reveals that embeddings from the GPT-2 model significantly outperform those from bidirectional models of BERT and RoBERTa, achieving an accuracy of 92.5% and an F1-score of 89.7%. LSTM neural networks were developed with both frozen and unfrozen embedding model layers. The model with the highest performance was achieved when the embedding layers were unfrozen. Further, the research finds that, in exploring the impact of different optimizers within this domain, the SGD optimizer demonstrates superior performance over Adam. Overall, these findings reveal important insights into the potential of unidirectional transformer-based approaches in enhancing cybersecurity defenses.

* 6 pages, 2 figures

Via

Access Paper or Ask Questions

The Psychological Impacts of Algorithmic and AI-Driven Social Media on Teenagers: A Call to Action

Aug 19, 2024

Sunil Arora, Sahil Arora, John D. Hastings

Figure 1 for The Psychological Impacts of Algorithmic and AI-Driven Social Media on Teenagers: A Call to Action

Figure 2 for The Psychological Impacts of Algorithmic and AI-Driven Social Media on Teenagers: A Call to Action

Abstract:This study investigates the meta-issues surrounding social media, which, while theoretically designed to enhance social interactions and improve our social lives by facilitating the sharing of personal experiences and life events, often results in adverse psychological impacts. Our investigation reveals a paradoxical outcome: rather than fostering closer relationships and improving social lives, the algorithms and structures that underlie social media platforms inadvertently contribute to a profound psychological impact on individuals, influencing them in unforeseen ways. This phenomenon is particularly pronounced among teenagers, who are disproportionately affected by curated online personas, peer pressure to present a perfect digital image, and the constant bombardment of notifications and updates that characterize their social media experience. As such, we issue a call to action for policymakers, platform developers, and educators to prioritize the well-being of teenagers in the digital age and work towards creating secure and safe social media platforms that protect the young from harm, online harassment, and exploitation.

* 7 pages, 0 figures, 2 tables, 2024 IEEE Conference on Digital Platforms and Societal Harms

Via

Access Paper or Ask Questions

Analyzing LLMs' Capabilities to Establish Implicit User Sentiment of Software Desirability

Aug 02, 2024

Sherri Weitl-Harms, John D. Hastings, Jonah Lum

Abstract:This study explores the use of several LLMs for providing quantitative zero-shot sentiment analysis of implicit software desirability expressed by users. The study provides scaled numerical sentiment analysis unlike other methods that simply classify sentiment as positive, neutral, or negative. Numerical analysis provides deeper insights into the magnitude of sentiment, to drive better decisions regarding product desirability. Data is collected through the use of the Microsoft Product Desirability Toolkit (PDT), a well-known qualitative user experience analysis tool. For initial exploration, the PDT metric was given to users of ZORQ, a gamification system used in undergraduate computer science education. The PDT data collected was fed through several LLMs (Claude Sonnet 3 and 3.5, GPT4, and GPT4o) and through a leading transfer learning technique, Twitter-Roberta-Base-Sentiment (TRBS), and through Vader, a leading sentiment analysis tool, for quantitative sentiment analysis. Each system was asked to evaluate the data in two ways, first by looking at the sentiment expressed in the PDT word/explanation pairs; and by looking at the sentiment expressed by the users in their grouped selection of five words and explanations, as a whole. Each LLM was also asked to provide its confidence (low, medium, high) in its sentiment score, along with an explanation of why it selected the sentiment value. All LLMs tested were able to statistically detect user sentiment from the users' grouped data, whereas TRBS and Vader were not. The confidence and explanation of confidence provided by the LLMs assisted in understanding the user sentiment. This study adds to a deeper understanding of evaluating user experiences, toward the goal of creating a universal tool that quantifies implicit sentiment expressed.

* 6 pages, 2 figures, 2 tables

Via

Access Paper or Ask Questions

Bi-Directional Transformers vs. word2vec: Discovering Vulnerabilities in Lifted Compiled Code

May 31, 2024

Gary A. McCully, John D. Hastings, Shengjie Xu, Adam Fortier

Figure 1 for Bi-Directional Transformers vs. word2vec: Discovering Vulnerabilities in Lifted Compiled Code

Figure 2 for Bi-Directional Transformers vs. word2vec: Discovering Vulnerabilities in Lifted Compiled Code

Figure 3 for Bi-Directional Transformers vs. word2vec: Discovering Vulnerabilities in Lifted Compiled Code

Figure 4 for Bi-Directional Transformers vs. word2vec: Discovering Vulnerabilities in Lifted Compiled Code

Abstract:Detecting vulnerabilities within compiled binaries is challenging due to lost high-level code structures and other factors such as architectural dependencies, compilers, and optimization options. To address these obstacles, this research explores vulnerability detection by using natural language processing (NLP) embedding techniques with word2vec, BERT, and RoBERTa to learn semantics from intermediate representation (LLVM) code. Long short-term memory (LSTM) neural networks were trained on embeddings from encoders created using approximately 118k LLVM functions from the Juliet dataset. This study is pioneering in its comparison of word2vec models with multiple bidirectional transformer (BERT, RoBERTa) embeddings built using LLVM code to train neural networks to detect vulnerabilities in compiled binaries. word2vec Continuous Bag of Words (CBOW) models achieved 92.3% validation accuracy in detecting vulnerabilities, outperforming word2vec Skip-Gram, BERT, and RoBERTa. This suggests that complex contextual NLP embeddings may not provide advantages over simpler word2vec models for this task when a limited number (e.g. 118K) of data samples are used to train the bidirectional transformer-based models. The comparative results provide novel insights into selecting optimal embeddings for learning compiler-independent semantic code representations to advance machine learning detection of vulnerabilities in compiled binaries.

* 8 pages, 0 figures, IEEE 4th Cyber Awareness and Research Symposium 2024 (CARS'24)

Via

Access Paper or Ask Questions

Confronting the Reproducibility Crisis: A Case Study in Validating Certified Robustness

May 29, 2024

Richard H. Moulton, Gary A. McCully, John D. Hastings

Abstract:Reproducibility is a cornerstone of scientific research, enabling validation, extension, and progress. However, the rapidly evolving nature of software and dependencies poses significant challenges to reproducing research results, particularly in fields like adversarial robustness for deep neural networks, where complex codebases and specialized toolkits are utilized. This paper presents a case study of attempting to validate the results on certified adversarial robustness in "SoK: Certified Robustness for Deep Neural Networks" using the VeriGauge toolkit. Despite following the documented methodology, numerous software and hardware compatibility issues were encountered, including outdated or unavailable dependencies, version conflicts, and driver incompatibilities. While a subset of the original results could be run, key findings related to the empirical robust accuracy of various verification methods proved elusive due to these technical obstacles, as well as slight discrepancies in the test results. This practical experience sheds light on the reproducibility crisis afflicting adversarial robustness research, where a lack of reproducibility threatens scientific integrity and hinders progress. The paper discusses the broader implications of this crisis, proposing potential solutions such as containerization, software preservation, and comprehensive documentation practices. Furthermore, it highlights the need for collaboration and standardization efforts within the research community to develop robust frameworks for reproducible research. By addressing the reproducibility crisis head-on, this work aims to contribute to the ongoing discourse on scientific reproducibility and advocate for best practices that ensure the reliability and validity of research findings within not only adversarial robustness, but security and technology research as a whole.

* 9 pages, 0 figures, submitted to ACSAC (Annual Computer Security Applications Conference) 2024

Via

Access Paper or Ask Questions