Picture for Leonard Tang

Leonard Tang

Endless Jailbreaks with Bijection Learning

Add code
Oct 02, 2024
Viaarxiv icon

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Add code
Apr 18, 2024
Figure 1 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Figure 2 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Figure 3 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Figure 4 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Viaarxiv icon

Consistent Explanations in the Face of Model Indeterminacy via Ensembling

Add code
Jun 13, 2023
Viaarxiv icon

Degraded Polygons Raise Fundamental Questions of Neural Network Perception

Add code
Jun 08, 2023
Viaarxiv icon

Baselines for Identifying Watermarked Large Language Models

Add code
May 29, 2023
Viaarxiv icon

Learning the Wrong Lessons: Inserting Trojans During Knowledge Distillation

Add code
Mar 09, 2023
Viaarxiv icon

MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding

Add code
Jan 06, 2023
Viaarxiv icon

The Naughtyformer: A Transformer Understands Offensive Humor

Add code
Nov 25, 2022
Figure 1 for The Naughtyformer: A Transformer Understands Offensive Humor
Figure 2 for The Naughtyformer: A Transformer Understands Offensive Humor
Figure 3 for The Naughtyformer: A Transformer Understands Offensive Humor
Figure 4 for The Naughtyformer: A Transformer Understands Offensive Humor
Viaarxiv icon

Lila: A Unified Benchmark for Mathematical Reasoning

Add code
Oct 31, 2022
Viaarxiv icon

A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams

Add code
Jun 11, 2022
Figure 1 for A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams
Figure 2 for A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams
Figure 3 for A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams
Figure 4 for A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams
Viaarxiv icon