Abstract:In the post-genomic era, large-scale personal DNA sequences are produced and collected for genetic medical diagnoses and new drug discovery, which, however, simultaneously poses serious challenges to the protection of personal genomic privacy. Existing genomic privacy-protection methods are either time-consuming or with low accuracy. To tackle these problems, this paper proposes a sequence similarity-based obfuscation method, namely IterMegaBLAST, for fast and reliable protection of personal genomic privacy. Specifically, given a randomly selected sequence from a dataset of DNA sequences, we first use MegaBLAST to find its most similar sequence from the dataset. These two aligned sequences form a cluster, for which an obfuscated sequence was generated via a DNA generalization lattice scheme. These procedures are iteratively performed until all of the sequences in the dataset are clustered and their obfuscated sequences are generated. Experimental results on two benchmark datasets demonstrate that under the same degree of anonymity, IterMegaBLAST significantly outperforms existing state-of-the-art approaches in terms of both utility accuracy and time complexity.
Abstract:With a rapidly increasing number of devices connected to the internet, big data has been applied to various domains of human life. Nevertheless, it has also opened new venues for breaching users' privacy. Hence it is highly required to develop techniques that enable data owners to privatize their data while keeping it useful for intended applications. Existing methods, however, do not offer enough flexibility for controlling the utility-privacy trade-off and may incur unfavorable results when privacy requirements are high. To tackle these drawbacks, we propose a compressive-privacy based method, namely RUCA (Ratio Utility and Cost Analysis), which can not only maximize performance for a privacy-insensitive classification task but also minimize the ability of any classifier to infer private information from the data. Experimental results on Census and Human Activity Recognition data sets demonstrate that RUCA significantly outperforms existing privacy preserving data projection techniques for a wide range of privacy pricings.