Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets

Add code
Jun 12, 2024
Figure 1 for Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
Figure 2 for Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
Figure 3 for Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
Figure 4 for Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: