Abstract:Training large neural networks with meaningful/usable differential privacy security guarantees is a demanding challenge. In this paper, we tackle this problem by revisiting the two key operations in Differentially Private Stochastic Gradient Descent (DP-SGD): 1) iterative perturbation and 2) gradient clipping. We propose a generic optimization framework, called {\em ModelMix}, which performs random aggregation of intermediate model states. It strengthens the composite privacy analysis utilizing the entropy of the training trajectory and improves the $(\epsilon, \delta)$ DP security parameters by an order of magnitude. We provide rigorous analyses for both the utility guarantees and privacy amplification of ModelMix. In particular, we present a formal study on the effect of gradient clipping in DP-SGD, which provides theoretical instruction on how hyper-parameters should be selected. We also introduce a refined gradient clipping method, which can further sharpen the privacy loss in private learning when combined with ModelMix. Thorough experiments with significant privacy/utility improvement are presented to support our theory. We train a Resnet-20 network on CIFAR10 with $70.4\%$ accuracy via ModelMix given $(\epsilon=8, \delta=10^{-5})$ DP-budget, compared to the same performance but with $(\epsilon=145.8,\delta=10^{-5})$ using regular DP-SGD; assisted with additional public low-dimensional gradient embedding, one can further improve the accuracy to $79.1\%$ with $(\epsilon=6.1, \delta=10^{-5})$ DP-budget, compared to the same performance but with $(\epsilon=111.2, \delta=10^{-5})$ without ModelMix.
Abstract:In recent years, there have been many works that use website fingerprinting techniques to enable a local adversary to determine which website a Tor user is visiting. However, most of these works rely on manually extracted features, and thus are fragile: a small change in the protocol or a simple defense often renders these attacks useless. In this work, we leverage deep learning techniques to create a more robust attack that does not require any manually extracted features. Specifically, we propose Var-CNN, an attack that uses model variations on convolutional neural networks with both the packet sequence and packet timing data. In open-world settings, Var-CNN attains higher true positive rate and lower false positive rate than any prior work at 90.9% and 0.3%, respectively. Moreover, these improvements are observed even with low amounts of training data, where deep learning techniques often suffer. Given the severity of our attacks, we also introduce a new countermeasure, DynaFlow, based on dynamically adjusting flows to protect against website fingerprinting attacks. DynaFlow provides a similar level of security as current state-of-the-art and defeats all attacks, including our own, while being over 40% more efficient than existing defenses. Moreover, unlike many prior defenses, DynaFlow can protect dynamically generated websites as well.