Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alexander Cai

Learning the Wrong Lessons: Inserting Trojans During Knowledge Distillation

Mar 09, 2023

Leonard Tang, Tom Shlomi, Alexander Cai

Abstract:In recent years, knowledge distillation has become a cornerstone of efficiently deployed machine learning, with labs and industries using knowledge distillation to train models that are inexpensive and resource-optimized. Trojan attacks have contemporaneously gained significant prominence, revealing fundamental vulnerabilities in deep learning models. Given the widespread use of knowledge distillation, in this work we seek to exploit the unlabelled data knowledge distillation process to embed Trojans in a student model without introducing conspicuous behavior in the teacher. We ultimately devise a Trojan attack that effectively reduces student accuracy, does not alter teacher performance, and is efficiently constructible in practice.

* ICLR 2023 Workshop on Backdoor Attacks and Defenses in Machine Learning

Via

Access Paper or Ask Questions

The Naughtyformer: A Transformer Understands Offensive Humor

Nov 25, 2022

Leonard Tang, Alexander Cai, Steve Li, Jason Wang

Figure 1 for The Naughtyformer: A Transformer Understands Offensive Humor

Figure 2 for The Naughtyformer: A Transformer Understands Offensive Humor

Figure 3 for The Naughtyformer: A Transformer Understands Offensive Humor

Figure 4 for The Naughtyformer: A Transformer Understands Offensive Humor

Abstract:Jokes are intentionally written to be funny, but not all jokes are created the same. Some jokes may be fit for a classroom of kindergarteners, but others are best reserved for a more mature audience. While recent work has shown impressive results on humor detection in text, here we instead investigate the more nuanced task of detecting humor subtypes, especially of the less innocent variety. To that end, we introduce a novel jokes dataset filtered from Reddit and solve the subtype classification task using a finetuned Transformer dubbed the Naughtyformer. Moreover, we show that our model is significantly better at detecting offensiveness in jokes compared to state-of-the-art methods.

* AAAI-23 Student Abstract

Via

Access Paper or Ask Questions