Abstract:This study investigates the privacy risks associated with text embeddings, focusing on the scenario where attackers cannot access the original embedding model. Contrary to previous research requiring direct model access, we explore a more realistic threat model by developing a transfer attack method. This approach uses a surrogate model to mimic the victim model's behavior, allowing the attacker to infer sensitive information from text embeddings without direct access. Our experiments across various embedding models and a clinical dataset demonstrate that our transfer attack significantly outperforms traditional methods, revealing the potential privacy vulnerabilities in embedding technologies and emphasizing the need for enhanced security measures.
Abstract:Complex or co-existing diseases are commonly treated using drug combinations, which can lead to higher risk of adverse side effects. The detection of polypharmacy side effects is usually done in Phase IV clinical trials, but there are still plenty which remain undiscovered when the drugs are put on the market. Such accidents have been affecting an increasing proportion of the population (15% in the US now) and it is thus of high interest to be able to predict the potential side effects as early as possible. Systematic combinatorial screening of possible drug-drug interactions (DDI) is challenging and expensive. However, the recent significant increases in data availability from pharmaceutical research and development efforts offer a novel paradigm for recovering relevant insights for DDI prediction. Accordingly, several recent approaches focus on curating massive DDI datasets (with millions of examples) and training machine learning models on them. Here we propose a neural network architecture able to set state-of-the-art results on this task---using the type of the side-effect and the molecular structure of the drugs alone---by leveraging a co-attentional mechanism. In particular, we show the importance of integrating joint information from the drug pairs early on when learning each drug's representation.