Picture for Martin Wattenberg

Martin Wattenberg

Think Before You Lie: How Reasoning Leads to Honesty

Add code
Mar 16, 2026
Viaarxiv icon

Think Before You Lie: How Reasoning Improves Honesty

Add code
Mar 10, 2026
Viaarxiv icon

Decomposing Query-Key Feature Interactions Using Contrastive Covariances

Add code
Feb 04, 2026
Viaarxiv icon

Does visualization help AI understand data?

Add code
Jul 24, 2025
Figure 1 for Does visualization help AI understand data?
Figure 2 for Does visualization help AI understand data?
Figure 3 for Does visualization help AI understand data?
Figure 4 for Does visualization help AI understand data?
Viaarxiv icon

Can Interpretation Predict Behavior on Unseen Data?

Add code
Jul 08, 2025
Viaarxiv icon

When Bad Data Leads to Good Models

Add code
May 07, 2025
Figure 1 for When Bad Data Leads to Good Models
Figure 2 for When Bad Data Leads to Good Models
Figure 3 for When Bad Data Leads to Good Models
Figure 4 for When Bad Data Leads to Good Models
Viaarxiv icon

The Geometry of Self-Verification in a Task-Specific Reasoning Model

Add code
Apr 19, 2025
Figure 1 for The Geometry of Self-Verification in a Task-Specific Reasoning Model
Figure 2 for The Geometry of Self-Verification in a Task-Specific Reasoning Model
Figure 3 for The Geometry of Self-Verification in a Task-Specific Reasoning Model
Figure 4 for The Geometry of Self-Verification in a Task-Specific Reasoning Model
Viaarxiv icon

Shared Global and Local Geometry of Language Model Embeddings

Add code
Mar 27, 2025
Figure 1 for Shared Global and Local Geometry of Language Model Embeddings
Figure 2 for Shared Global and Local Geometry of Language Model Embeddings
Figure 3 for Shared Global and Local Geometry of Language Model Embeddings
Figure 4 for Shared Global and Local Geometry of Language Model Embeddings
Viaarxiv icon

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Add code
Feb 18, 2025
Viaarxiv icon

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon