Picture for Alexander Meinke

Alexander Meinke

Frontier Models are Capable of In-context Scheming

Add code
Dec 06, 2024
Viaarxiv icon

Towards evaluations-based safety cases for AI scheming

Add code
Nov 07, 2024
Viaarxiv icon

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

Add code
Jul 05, 2024
Viaarxiv icon

Tell, don't show: Declarative facts influence how LLMs generalize

Add code
Dec 12, 2023
Viaarxiv icon

Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD Training Data Estimate a Combination of the Same Core Quantities

Add code
Jun 20, 2022
Figure 1 for Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD Training Data Estimate a Combination of the Same Core Quantities
Figure 2 for Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD Training Data Estimate a Combination of the Same Core Quantities
Figure 3 for Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD Training Data Estimate a Combination of the Same Core Quantities
Figure 4 for Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD Training Data Estimate a Combination of the Same Core Quantities
Viaarxiv icon

Provably Robust Detection of Out-of-distribution Data (almost) for free

Add code
Jun 08, 2021
Figure 1 for Provably Robust Detection of Out-of-distribution Data (almost) for free
Figure 2 for Provably Robust Detection of Out-of-distribution Data (almost) for free
Figure 3 for Provably Robust Detection of Out-of-distribution Data (almost) for free
Figure 4 for Provably Robust Detection of Out-of-distribution Data (almost) for free
Viaarxiv icon

Provable Worst Case Guarantees for the Detection of Out-of-Distribution Data

Add code
Jul 16, 2020
Figure 1 for Provable Worst Case Guarantees for the Detection of Out-of-Distribution Data
Figure 2 for Provable Worst Case Guarantees for the Detection of Out-of-Distribution Data
Figure 3 for Provable Worst Case Guarantees for the Detection of Out-of-Distribution Data
Figure 4 for Provable Worst Case Guarantees for the Detection of Out-of-Distribution Data
Viaarxiv icon

Adversarial Robustness on In- and Out-Distribution Improves Explainability

Add code
Mar 20, 2020
Figure 1 for Adversarial Robustness on In- and Out-Distribution Improves Explainability
Figure 2 for Adversarial Robustness on In- and Out-Distribution Improves Explainability
Figure 3 for Adversarial Robustness on In- and Out-Distribution Improves Explainability
Figure 4 for Adversarial Robustness on In- and Out-Distribution Improves Explainability
Viaarxiv icon

Towards neural networks that provably know when they don't know

Add code
Sep 26, 2019
Figure 1 for Towards neural networks that provably know when they don't know
Figure 2 for Towards neural networks that provably know when they don't know
Figure 3 for Towards neural networks that provably know when they don't know
Figure 4 for Towards neural networks that provably know when they don't know
Viaarxiv icon