Picture for Florian Tramer

Florian Tramer

Dj

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Viaarxiv icon

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Add code
Mar 28, 2024
Figure 1 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 2 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 3 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 4 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Viaarxiv icon

Are aligned neural networks adversarially aligned?

Add code
Jun 26, 2023
Viaarxiv icon

Increasing Confidence in Adversarial Robustness Evaluations

Add code
Jun 28, 2022
Figure 1 for Increasing Confidence in Adversarial Robustness Evaluations
Figure 2 for Increasing Confidence in Adversarial Robustness Evaluations
Figure 3 for Increasing Confidence in Adversarial Robustness Evaluations
Figure 4 for Increasing Confidence in Adversarial Robustness Evaluations
Viaarxiv icon

The Privacy Onion Effect: Memorization is Relative

Add code
Jun 22, 2022
Figure 1 for The Privacy Onion Effect: Memorization is Relative
Figure 2 for The Privacy Onion Effect: Memorization is Relative
Figure 3 for The Privacy Onion Effect: Memorization is Relative
Figure 4 for The Privacy Onion Effect: Memorization is Relative
Viaarxiv icon

(Certified!!) Adversarial Robustness for Free!

Add code
Jun 21, 2022
Figure 1 for (Certified!!) Adversarial Robustness for Free!
Figure 2 for (Certified!!) Adversarial Robustness for Free!
Figure 3 for (Certified!!) Adversarial Robustness for Free!
Figure 4 for (Certified!!) Adversarial Robustness for Free!
Viaarxiv icon

Debugging Differential Privacy: A Case Study for Privacy Auditing

Add code
Mar 28, 2022
Figure 1 for Debugging Differential Privacy: A Case Study for Privacy Auditing
Viaarxiv icon

Quantifying Memorization Across Neural Language Models

Add code
Feb 24, 2022
Figure 1 for Quantifying Memorization Across Neural Language Models
Figure 2 for Quantifying Memorization Across Neural Language Models
Figure 3 for Quantifying Memorization Across Neural Language Models
Figure 4 for Quantifying Memorization Across Neural Language Models
Viaarxiv icon

Membership Inference Attacks From First Principles

Add code
Dec 07, 2021
Figure 1 for Membership Inference Attacks From First Principles
Figure 2 for Membership Inference Attacks From First Principles
Figure 3 for Membership Inference Attacks From First Principles
Figure 4 for Membership Inference Attacks From First Principles
Viaarxiv icon

Extracting Training Data from Large Language Models

Add code
Dec 14, 2020
Figure 1 for Extracting Training Data from Large Language Models
Figure 2 for Extracting Training Data from Large Language Models
Figure 3 for Extracting Training Data from Large Language Models
Figure 4 for Extracting Training Data from Large Language Models
Viaarxiv icon