KTH
Abstract:Big data analysis poses the dual problem of privacy preservation and utility, i.e., how accurate data analyses remain after transforming original data in order to protect the privacy of the individuals that the data is about - and whether they are accurate enough to be meaningful. In this paper, we thus investigate across several datasets whether different methods of generating fully synthetic data vary in their utility a priori (when the specific analyses to be performed on the data are not known yet), how closely their results conform to analyses on original data a posteriori, and whether these two effects are correlated. We find some methods (decision-tree based) to perform better than others across the board, sizeable effects of some choices of imputation parameters (notably the number of released datasets), no correlation between broad utility metrics and analysis accuracy, and varying correlations for narrow metrics. We did get promising findings for classification tasks when using synthetic data for training machine learning models, which we consider worth exploring further also in terms of mitigating privacy attacks against ML models such as membership inference and model inversion.
Abstract:Mass surveillance of the population by state agencies and corporate parties is now a well-known fact. Journalists and whistle-blowers still lack means to circumvent global spying for the sake of their investigations. With Spores, we propose a way for journalists and their sources to plan a posteriori file exchanges when they physically meet. We leverage on the multiplication of personal devices per capita to provide a lightweight, robust and fully anonymous decentralised file transfer protocol between users. Spores hinges on our novel concept of e-squads: one's personal devices, rendered intelligent by gossip communication protocols, can provide private and dependable services to their user. People's e-squads are federated into a novel onion routing network, able to withstand the inherent unreliability of personal appliances while providing reliable routing. Spores' performances are competitive, and its privacy properties of the communication outperform state of the art onion routing strategies.