Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Osman Ramadan

N-grams Bayesian Differential Privacy

Jan 29, 2021

Osman Ramadan, James Withers, Douglas Orr

Figure 1 for N-grams Bayesian Differential Privacy

Figure 2 for N-grams Bayesian Differential Privacy

Figure 3 for N-grams Bayesian Differential Privacy

Figure 4 for N-grams Bayesian Differential Privacy

Abstract:Differential privacy has gained popularity in machine learning as a strong privacy guarantee, in contrast to privacy mitigation techniques such as k-anonymity. However, applying differential privacy to n-gram counts significantly degrades the utility of derived language models due to their large vocabularies. We propose a differential privacy mechanism that uses public data as a prior in a Bayesian setup to provide tighter bounds on the privacy loss metric epsilon, and thus better privacy-utility trade-offs. It first transforms the counts to log space, approximating the distribution of the public and private data as Gaussian. The posterior distribution is then evaluated and softmax is applied to produce a probability distribution. This technique achieves up to 85% reduction in KL divergence compared to previously known mechanisms at epsilon equals 0.1. We compare our mechanism to k-anonymity in a n-gram language modelling task and show that it offers competitive performance at large vocabulary sizes, while also providing superior privacy protection.

* 12 pages, 6 figures

Via

Access Paper or Ask Questions

Privacy Analysis in Language Models via Training Data Leakage Report

Jan 14, 2021

Huseyin A. Inan, Osman Ramadan, Lukas Wutschitz, Daniel Jones, Victor Rühle, James Withers, Robert Sim

Figure 1 for Privacy Analysis in Language Models via Training Data Leakage Report

Figure 2 for Privacy Analysis in Language Models via Training Data Leakage Report

Figure 3 for Privacy Analysis in Language Models via Training Data Leakage Report

Figure 4 for Privacy Analysis in Language Models via Training Data Leakage Report

Abstract:Recent advances in neural network based language models lead to successful deployments of such models, improving user experience in various applications. It has been demonstrated that strong performance of language models may come along with the ability to memorize rare training samples, which poses serious privacy threats in case the model training is conducted on confidential user content. This necessitates privacy monitoring techniques to minimize the chance of possible privacy breaches for the models deployed in practice. In this work, we introduce a methodology that investigates identifying the user content in the training data that could be leaked under a strong and realistic threat model. We propose two metrics to quantify user-level data leakage by measuring a model's ability to produce unique sentence fragments within training data. Our metrics further enable comparing different models trained on the same data in terms of privacy. We demonstrate our approach through extensive numerical studies on real-world datasets such as email and forum conversations. We further illustrate how the proposed metrics can be utilized to investigate the efficacy of mitigations like differentially private training or API hardening.

Via

Access Paper or Ask Questions

MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling

Sep 29, 2018

Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ramadan, Milica Gašić

Figure 1 for MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling

Figure 2 for MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling

Figure 3 for MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling

Figure 4 for MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling

Abstract:Even though machine learning has become the major scene in dialogue research community, the real breakthrough has been blocked by the scale of data available. To address this fundamental obstacle, we introduce the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics. At a size of $10$k dialogues, it is at least one order of magnitude larger than all previous annotated task-oriented corpora. The contribution of this work apart from the open-sourced dataset labelled with dialogue belief states and dialogue actions is two-fold: firstly, a detailed description of the data collection procedure along with a summary of data structure and analysis is provided. The proposed data-collection pipeline is entirely based on crowd-sourcing without the need of hiring professional annotators; secondly, a set of benchmark results of belief tracking, dialogue act and response generation is reported, which shows the usability of the data and sets a baseline for future studies.

* Accepted for publication at EMNLP 2018

Via

Access Paper or Ask Questions

Deep learning for language understanding of mental health concepts derived from Cognitive Behavioural Therapy

Sep 03, 2018

Lina Rojas-Barahona, Bo-Hsiang Tseng, Yinpei Dai, Clare Mansfield, Osman Ramadan, Stefan Ultes, Michael Crawford, Milica Gasic

Figure 1 for Deep learning for language understanding of mental health concepts derived from Cognitive Behavioural Therapy

Figure 2 for Deep learning for language understanding of mental health concepts derived from Cognitive Behavioural Therapy

Figure 3 for Deep learning for language understanding of mental health concepts derived from Cognitive Behavioural Therapy

Figure 4 for Deep learning for language understanding of mental health concepts derived from Cognitive Behavioural Therapy

Abstract:In recent years, we have seen deep learning and distributed representations of words and sentences make impact on a number of natural language processing tasks, such as similarity, entailment and sentiment analysis. Here we introduce a new task: understanding of mental health concepts derived from Cognitive Behavioural Therapy (CBT). We define a mental health ontology based on the CBT principles, annotate a large corpus where this phenomena is exhibited and perform understanding using deep learning and distributed representations. Our results show that the performance of deep learning models combined with word embeddings or sentence embeddings significantly outperform non-deep-learning models in this difficult task. This understanding module will be an essential component of a statistical dialogue system delivering therapy.

* Accepted for publication at LOUHI 2018: The Ninth International Workshop on Health Text Mining and Information Analysis

Via

Access Paper or Ask Questions

Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing

Jul 17, 2018

Osman Ramadan, Paweł Budzianowski, Milica Gašić

Figure 1 for Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing

Figure 2 for Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing

Figure 3 for Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing

Abstract:Robust dialogue belief tracking is a key component in maintaining good quality dialogue systems. The tasks that dialogue systems are trying to solve are becoming increasingly complex, requiring scalability to multi domain, semantically rich dialogues. However, most current approaches have difficulty scaling up with domains because of the dependency of the model parameters on the dialogue ontology. In this paper, a novel approach is introduced that fully utilizes semantic similarity between dialogue utterances and the ontology terms, allowing the information to be shared across domains. The evaluation is performed on a recently collected multi-domain dialogues dataset, one order of magnitude larger than currently available corpora. Our model demonstrates great capability in handling multi-domain dialogues, simultaneously outperforming existing state-of-the-art models in single-domain dialogue tracking tasks.

* 10 pages, 1 figure and 2 tables. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL)

Via

Access Paper or Ask Questions