Picture for Steven Basart

Steven Basart

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

Add code
Jul 31, 2024
Viaarxiv icon

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Add code
Mar 06, 2024
Figure 1 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 2 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 3 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 4 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Viaarxiv icon

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Add code
Feb 06, 2024
Viaarxiv icon

Representation Engineering: A Top-Down Approach to AI Transparency

Add code
Oct 10, 2023
Figure 1 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 2 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 3 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 4 for Representation Engineering: A Top-Down Approach to AI Transparency
Viaarxiv icon

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

Add code
Apr 06, 2023
Viaarxiv icon

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios

Add code
Oct 18, 2022
Figure 1 for How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
Figure 2 for How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
Figure 3 for How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
Figure 4 for How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
Viaarxiv icon

Towards Robustness of Neural Networks

Add code
Dec 30, 2021
Figure 1 for Towards Robustness of Neural Networks
Figure 2 for Towards Robustness of Neural Networks
Figure 3 for Towards Robustness of Neural Networks
Figure 4 for Towards Robustness of Neural Networks
Viaarxiv icon

Measuring Coding Challenge Competence With APPS

Add code
May 27, 2021
Figure 1 for Measuring Coding Challenge Competence With APPS
Figure 2 for Measuring Coding Challenge Competence With APPS
Figure 3 for Measuring Coding Challenge Competence With APPS
Figure 4 for Measuring Coding Challenge Competence With APPS
Viaarxiv icon

Measuring Mathematical Problem Solving With the MATH Dataset

Add code
Mar 05, 2021
Figure 1 for Measuring Mathematical Problem Solving With the MATH Dataset
Figure 2 for Measuring Mathematical Problem Solving With the MATH Dataset
Figure 3 for Measuring Mathematical Problem Solving With the MATH Dataset
Figure 4 for Measuring Mathematical Problem Solving With the MATH Dataset
Viaarxiv icon

Measuring Massive Multitask Language Understanding

Add code
Sep 21, 2020
Figure 1 for Measuring Massive Multitask Language Understanding
Figure 2 for Measuring Massive Multitask Language Understanding
Figure 3 for Measuring Massive Multitask Language Understanding
Figure 4 for Measuring Massive Multitask Language Understanding
Viaarxiv icon