Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Martin Sablotny

Improved Large Language Model Jailbreak Detection via Pretrained Embeddings

Dec 02, 2024

Erick Galinkin, Martin Sablotny

Figure 1 for Improved Large Language Model Jailbreak Detection via Pretrained Embeddings

Figure 2 for Improved Large Language Model Jailbreak Detection via Pretrained Embeddings

Figure 3 for Improved Large Language Model Jailbreak Detection via Pretrained Embeddings

Figure 4 for Improved Large Language Model Jailbreak Detection via Pretrained Embeddings

Abstract:The adoption of large language models (LLMs) in many applications, from customer service chat bots and software development assistants to more capable agentic systems necessitates research into how to secure these systems. Attacks like prompt injection and jailbreaking attempt to elicit responses and actions from these models that are not compliant with the safety, privacy, or content policies of organizations using the model in their application. In order to counter abuse of LLMs for generating potentially harmful replies or taking undesirable actions, LLM owners must apply safeguards during training and integrate additional tools to block the LLM from generating text that abuses the model. Jailbreaking prompts play a vital role in convincing an LLM to generate potentially harmful content, making it important to identify jailbreaking attempts to block any further steps. In this work, we propose a novel approach to detect jailbreak prompts based on pairing text embeddings well-suited for retrieval with traditional machine learning classification algorithms. Our approach outperforms all publicly available methods from open source LLM security applications.

* Submitted to AICS 2025: https://aics.site

Via

Access Paper or Ask Questions

Reinforcement learning guided fuzz testing for a browser's HTML rendering engine

Jul 27, 2023

Martin Sablotny, Bjørn Sand Jensen, Jeremy Singer

Abstract:Generation-based fuzz testing can uncover various bugs and security vulnerabilities. However, compared to mutation-based fuzz testing, it takes much longer to develop a well-balanced generator that produces good test cases and decides where to break the underlying structure to exercise new code paths. We propose a novel approach to combine a trained test case generator deep learning model with a double deep Q-network (DDQN) for the first time. The DDQN guides test case creation based on a code coverage signal. Our approach improves the code coverage performance of the underlying generator model by up to 18.5\% for the Firefox HTML rendering engine compared to the baseline grammar based fuzzer.

Via

Access Paper or Ask Questions

Recurrent Neural Networks for Fuzz Testing Web Browsers

Dec 12, 2018

Martin Sablotny, Bjørn Sand Jensen, Chris W. Johnson

Figure 1 for Recurrent Neural Networks for Fuzz Testing Web Browsers

Figure 2 for Recurrent Neural Networks for Fuzz Testing Web Browsers

Figure 3 for Recurrent Neural Networks for Fuzz Testing Web Browsers

Figure 4 for Recurrent Neural Networks for Fuzz Testing Web Browsers

Abstract:Generation-based fuzzing is a software testing approach which is able to discover different types of bugs and vulnerabilities in software. It is, however, known to be very time consuming to design and fine tune classical fuzzers to achieve acceptable coverage, even for small-scale software systems. To address this issue, we investigate a machine learning-based approach to fuzz testing in which we outline a family of test-case generators based on Recurrent Neural Networks (RNNs) and train those on readily available datasets with a minimum of human fine tuning. The proposed generators do, in contrast to previous work, not rely on heuristic sampling strategies but principled sampling from the predictive distributions. We provide a detailed analysis to demonstrate the characteristics and efficacy of the proposed generators in a challenging web browser testing scenario. The empirical results show that the RNN-based generators are able to provide better coverage than a mutation based method and are able to discover paths not discovered by a classical fuzzer. Our results supplement findings in other domains suggesting that generation based fuzzing with RNNs is a viable route to better software quality conditioned on the use of a suitable model selection/analysis procedure.

* Preprint of the paper presented at ICISC 2018 in Korea

Via

Access Paper or Ask Questions