Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tim Watson

Know Your Neighborhood: General and Zero-Shot Capable Binary Function Search Powered by Call Graphlets

Jun 02, 2024

Joshua Collyer, Tim Watson, Iain Phillips

Abstract:Binary code similarity detection is an important problem with applications in areas like malware analysis, vulnerability research and plagiarism detection. This paper proposes a novel graph neural network architecture combined with a novel graph data representation called call graphlets. A call graphlet encodes the neighborhood around each function in a binary executable, capturing the local and global context through a series of statistical features. A specialized graph neural network model is then designed to operate on this graph representation, learning to map it to a feature vector that encodes semantic code similarities using deep metric learning. The proposed approach is evaluated across four distinct datasets covering different architectures, compiler toolchains, and optimization levels. Experimental results demonstrate that the combination of call graphlets and the novel graph neural network architecture achieves state-of-the-art performance compared to baseline techniques across cross-architecture, mono-architecture and zero shot tasks. In addition, our proposed approach also performs well when evaluated against an out-of-domain function inlining task. Overall, the work provides a general and effective graph neural network-based solution for conducting binary code similarity detection.

Via

Access Paper or Ask Questions

Canaries and Whistles: Resilient Drone Communication Networks with (or without) Deep Reinforcement Learning

Dec 08, 2023

Chris Hicks, Vasilios Mavroudis, Myles Foley, Thomas Davies, Kate Highnam, Tim Watson

Figure 1 for Canaries and Whistles: Resilient Drone Communication Networks with (or without) Deep Reinforcement Learning

Figure 2 for Canaries and Whistles: Resilient Drone Communication Networks with (or without) Deep Reinforcement Learning

Figure 3 for Canaries and Whistles: Resilient Drone Communication Networks with (or without) Deep Reinforcement Learning

Figure 4 for Canaries and Whistles: Resilient Drone Communication Networks with (or without) Deep Reinforcement Learning

Abstract:Communication networks able to withstand hostile environments are critically important for disaster relief operations. In this paper, we consider a challenging scenario where drones have been compromised in the supply chain, during their manufacture, and harbour malicious software capable of wide-ranging and infectious disruption. We investigate multi-agent deep reinforcement learning as a tool for learning defensive strategies that maximise communications bandwidth despite continual adversarial interference. Using a public challenge for learning network resilience strategies, we propose a state-of-the-art expert technique and study its superiority over deep reinforcement learning agents. Correspondingly, we identify three specific methods for improving the performance of our learning-based agents: (1) ensuring each observation contains the necessary information, (2) using expert agents to provide a curriculum for learning, and (3) paying close attention to reward. We apply our methods and present a new mixed strategy enabling expert and learning-based agents to work together and improve on all prior results.

* In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security. Association for Computing Machinery, 91-101 (2023)
* Published in AISec '23. This version fixes some terminology to improve readability

Via

Access Paper or Ask Questions

FASER: Binary Code Similarity Search through the use of Intermediate Representations

Oct 06, 2023

Josh Collyer, Tim Watson, Iain Phillips

Abstract:Being able to identify functions of interest in cross-architecture software is useful whether you are analysing for malware, securing the software supply chain or conducting vulnerability research. Cross-Architecture Binary Code Similarity Search has been explored in numerous studies and has used a wide range of different data sources to achieve its goals. The data sources typically used draw on common structures derived from binaries such as function control flow graphs or binary level call graphs, the output of the disassembly process or the outputs of a dynamic analysis approach. One data source which has received less attention is binary intermediate representations. Binary Intermediate representations possess two interesting properties: they are cross architecture by their very nature and encode the semantics of a function explicitly to support downstream usage. Within this paper we propose Function as a String Encoded Representation (FASER) which combines long document transformers with the use of intermediate representations to create a model capable of cross architecture function search without the need for manual feature engineering, pre-training or a dynamic analysis step. We compare our approach against a series of baseline approaches for two tasks; A general function search task and a targeted vulnerability search task. Our approach demonstrates strong performance across both tasks, performing better than all baseline approaches.

* 10 pages, To be presented as Conference on Applied Machine Learning for Information Security

Via

Access Paper or Ask Questions