Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anugya Srivastava

No Offense Taken: Eliciting Offensiveness from Language Models

Oct 02, 2023

Anugya Srivastava, Rahul Ahuja, Rohith Mukku

Abstract:This work was completed in May 2022. For safe and reliable deployment of language models in the real world, testing needs to be robust. This robustness can be characterized by the difficulty and diversity of the test cases we evaluate these models on. Limitations in human-in-the-loop test case generation has prompted an advent of automated test case generation approaches. In particular, we focus on Red Teaming Language Models with Language Models by Perez et al.(2022). Our contributions include developing a pipeline for automated test case generation via red teaming that leverages publicly available smaller language models (LMs), experimenting with different target LMs and red classifiers, and generating a corpus of test cases that can help in eliciting offensive responses from widely deployed LMs and identifying their failure modes.

Via

Access Paper or Ask Questions

Out of Distribution Detection on ImageNet-O

Jan 23, 2022

Anugya Srivastava, Shriya Jain, Mugdha Thigle

Abstract:Out of distribution (OOD) detection is a crucial part of making machine learning systems robust. The ImageNet-O dataset is an important tool in testing the robustness of ImageNet trained deep neural networks that are widely used across a variety of systems and applications. We aim to perform a comparative analysis of OOD detection methods on ImageNet-O, a first of its kind dataset with a label distribution different than that of ImageNet, that has been created to aid research in OOD detection for ImageNet models. As this dataset is fairly new, we aim to provide a comprehensive benchmarking of some of the current state of the art OOD detection methods on this novel dataset. This benchmarking covers a variety of model architectures, settings where we haves prior access to the OOD data versus when we don't, predictive score based approaches, deep generative approaches to OOD detection, and more.

Via

Access Paper or Ask Questions