Picture for Jannik Brinkmann

Jannik Brinkmann

Jailbreak Strength and Model Similarity Predict Transferability

Add code
Jun 15, 2025
Viaarxiv icon

Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages

Add code
Jan 10, 2025
Viaarxiv icon

NSA: Neuro-symbolic ARC Challenge

Add code
Jan 08, 2025
Viaarxiv icon

The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability

Add code
Aug 02, 2024
Figure 1 for The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Figure 2 for The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Figure 3 for The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Viaarxiv icon

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models

Add code
Jul 31, 2024
Figure 1 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Figure 2 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Figure 3 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Figure 4 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Viaarxiv icon

NNsight and NDIF: Democratizing Access to Foundation Model Internals

Add code
Jul 18, 2024
Figure 1 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Figure 2 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Figure 3 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Figure 4 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Viaarxiv icon

GOV-REK: Governed Reward Engineering Kernels for Designing Robust Multi-Agent Reinforcement Learning Systems

Add code
Apr 14, 2024
Viaarxiv icon

A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task

Add code
Feb 28, 2024
Viaarxiv icon

A Multidimensional Analysis of Social Biases in Vision Transformers

Add code
Aug 03, 2023
Viaarxiv icon