Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Finding Meaningful Distributions of ML Black-boxes under Forensic Investigation

May 10, 2023

Jiyi Zhang, Han Fang, Hwee Kuan Lee, Ee-Chien Chang

Figure 1 for Finding Meaningful Distributions of ML Black-boxes under Forensic Investigation

Figure 2 for Finding Meaningful Distributions of ML Black-boxes under Forensic Investigation

Figure 3 for Finding Meaningful Distributions of ML Black-boxes under Forensic Investigation

Figure 4 for Finding Meaningful Distributions of ML Black-boxes under Forensic Investigation

Share this with someone who'll enjoy it:

Abstract:Given a poorly documented neural network model, we take the perspective of a forensic investigator who wants to find out the model's data domain (e.g. whether on face images or traffic signs). Although existing methods such as membership inference and model inversion can be used to uncover some information about an unknown model, they still require knowledge of the data domain to start with. In this paper, we propose solving this problem by leveraging on comprehensive corpus such as ImageNet to select a meaningful distribution that is close to the original training distribution and leads to high performance in follow-up investigations. The corpus comprises two components, a large dataset of samples and meta information such as hierarchical structure and textual information on the samples. Our goal is to select a set of samples from the corpus for the given model. The core of our method is an objective function that considers two criteria on the selected samples: the model functional properties (derived from the dataset), and semantics (derived from the metadata). We also give an algorithm to efficiently search the large space of all possible subsets w.r.t. the objective function. Experimentation results show that the proposed method is effective. For example, cloning a given model (originally trained with CIFAR-10) by using Caltech 101 can achieve 45.5% accuracy. By using datasets selected by our method, the accuracy is improved to 72.0%.

View paper on

Share this with someone who'll enjoy it:

Title:Finding Meaningful Distributions of ML Black-boxes under Forensic Investigation

Paper and Code