Abstract:Website Fingerprinting (WF) is considered a major threat to the anonymity of Tor users (and other anonymity systems). While state-of-the-art WF techniques have claimed high attack accuracies, e.g., by leveraging Deep Neural Networks (DNN), several recent works have questioned the practicality of such WF attacks in the real world due to the assumptions made in the design and evaluation of these attacks. In this work, we argue that such impracticality issues are mainly due to the attacker's inability in collecting training data in comprehensive network conditions, e.g., a WF classifier may be trained only on samples collected on specific high-bandwidth network links but deployed on connections with different network conditions. We show that augmenting network traces can enhance the performance of WF classifiers in unobserved network conditions. Specifically, we introduce NetAugment, an augmentation technique tailored to the specifications of Tor traces. We instantiate NetAugment through semi-supervised and self-supervised learning techniques. Our extensive open-world and close-world experiments demonstrate that under practical evaluation settings, our WF attacks provide superior performances compared to the state-of-the-art; this is due to their use of augmented network traces for training, which allows them to learn the features of target traffic in unobserved settings. For instance, with a 5-shot learning in a closed-world scenario, our self-supervised WF attack (named NetCLR) reaches up to 80% accuracy when the traces for evaluation are collected in a setting unobserved by the WF adversary. This is compared to an accuracy of 64.4% achieved by the state-of-the-art Triplet Fingerprinting [35]. We believe that the promising results of our work can encourage the use of network trace augmentation in other types of network traffic analysis.
Abstract:In this paper, a new perspective is suggested for unsupervised Ontology Matching (OM) or Ontology Alignment (OA) by treating it as a translation task. Ontologies are represented as graphs, and the translation is performed from a node in the source ontology graph to a path in the target ontology graph. The proposed framework, Truveta Mapper (TM), leverages a multi-task sequence-to-sequence transformer model to perform alignment across multiple ontologies in a zero-shot, unified and end-to-end manner. Multi-tasking enables the model to implicitly learn the relationship between different ontologies via transfer-learning without requiring any explicit cross-ontology manually labeled data. This also enables the formulated framework to outperform existing solutions for both runtime latency and alignment quality. The model is pre-trained and fine-tuned only on publicly available text corpus and inner-ontologies data. The proposed solution outperforms state-of-the-art approaches, Edit-Similarity, LogMap, AML, BERTMap, and the recently presented new OM frameworks in Ontology Alignment Evaluation Initiative (OAEI22), offers log-linear complexity in contrast to quadratic in the existing end-to-end methods, and overall makes the OM task efficient and more straightforward without much post-processing involving mapping extension or mapping repair.
Abstract:Deep Neural Networks (DNNs) are commonly used for various traffic analysis problems, such as website fingerprinting and flow correlation, as they outperform traditional (e.g., statistical) techniques by large margins. However, deep neural networks are known to be vulnerable to adversarial examples: adversarial inputs to the model that get labeled incorrectly by the model due to small adversarial perturbations. In this paper, for the first time, we show that an adversary can defeat DNN-based traffic analysis techniques by applying \emph{adversarial perturbations} on the patterns of \emph{live} network traffic.
Abstract:Flow correlation is the core technique used in a multitude of deanonymization attacks on Tor. Despite the importance of flow correlation attacks on Tor, existing flow correlation techniques are considered to be ineffective and unreliable in linking Tor flows when applied at a large scale, i.e., they impose high rates of false positive error rates or require impractically long flow observations to be able to make reliable correlations. In this paper, we show that, unfortunately, flow correlation attacks can be conducted on Tor traffic with drastically higher accuracies than before by leveraging emerging learning mechanisms. We particularly design a system, called DeepCorr, that outperforms the state-of-the-art by significant margins in correlating Tor connections. DeepCorr leverages an advanced deep learning architecture to learn a flow correlation function tailored to Tor's complex network this is in contrast to previous works' use of generic statistical correlation metrics to correlated Tor flows. We show that with moderate learning, DeepCorr can correlate Tor connections (and therefore break its anonymity) with accuracies significantly higher than existing algorithms, and using substantially shorter lengths of flow observations. For instance, by collecting only about 900 packets of each target Tor flow (roughly 900KB of Tor data), DeepCorr provides a flow correlation accuracy of 96% compared to 4% by the state-of-the-art system of RAPTOR using the same exact setting. We hope that our work demonstrates the escalating threat of flow correlation attacks on Tor given recent advances in learning algorithms, calling for the timely deployment of effective countermeasures by the Tor community.