Picture for Alexandra Souly

Alexandra Souly

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Add code
Oct 11, 2024
Viaarxiv icon

A StrongREJECT for Empty Jailbreaks

Add code
Feb 15, 2024
Viaarxiv icon

Leading the Pack: N-player Opponent Shaping

Add code
Dec 26, 2023
Viaarxiv icon

JaxMARL: Multi-Agent RL Environments in JAX

Add code
Nov 20, 2023
Viaarxiv icon

Retrospective on the 2021 BASALT Competition on Learning from Human Feedback

Add code
Apr 14, 2022
Figure 1 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback
Figure 2 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback
Figure 3 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback
Figure 4 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback
Viaarxiv icon