Picture for Marius Hobbhahn

Marius Hobbhahn

Frontier Models are Capable of In-context Scheming

Add code
Dec 06, 2024
Viaarxiv icon

Towards evaluations-based safety cases for AI scheming

Add code
Nov 07, 2024
Viaarxiv icon

Analyzing Probabilistic Methods for Evaluating Agent Capabilities

Add code
Sep 24, 2024
Figure 1 for Analyzing Probabilistic Methods for Evaluating Agent Capabilities
Figure 2 for Analyzing Probabilistic Methods for Evaluating Agent Capabilities
Viaarxiv icon

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

Add code
Jul 05, 2024
Viaarxiv icon

Flexible inference in heterogeneous and attributed multilayer networks

Add code
May 31, 2024
Viaarxiv icon

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Add code
May 17, 2024
Viaarxiv icon

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability

Add code
May 17, 2024
Viaarxiv icon

Black-Box Access is Insufficient for Rigorous AI Audits

Add code
Jan 25, 2024
Figure 1 for Black-Box Access is Insufficient for Rigorous AI Audits
Figure 2 for Black-Box Access is Insufficient for Rigorous AI Audits
Figure 3 for Black-Box Access is Insufficient for Rigorous AI Audits
Viaarxiv icon

Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure

Add code
Nov 27, 2023
Viaarxiv icon

Machine Learning Model Sizes and the Parameter Gap

Add code
Jul 05, 2022
Figure 1 for Machine Learning Model Sizes and the Parameter Gap
Figure 2 for Machine Learning Model Sizes and the Parameter Gap
Figure 3 for Machine Learning Model Sizes and the Parameter Gap
Figure 4 for Machine Learning Model Sizes and the Parameter Gap
Viaarxiv icon