Picture for David D. Baek

David D. Baek

A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring

Add code
Feb 26, 2026
Viaarxiv icon

Scaling Laws For Scalable Oversight

Add code
Apr 25, 2025
Viaarxiv icon

Towards Understanding Distilled Reasoning Models: A Representational Approach

Add code
Mar 05, 2025
Figure 1 for Towards Understanding Distilled Reasoning Models: A Representational Approach
Figure 2 for Towards Understanding Distilled Reasoning Models: A Representational Approach
Figure 3 for Towards Understanding Distilled Reasoning Models: A Representational Approach
Figure 4 for Towards Understanding Distilled Reasoning Models: A Representational Approach
Viaarxiv icon

Harmonic Loss Trains Interpretable AI Models

Add code
Feb 03, 2025
Viaarxiv icon

Generalization from Starvation: Hints of Universality in LLM Knowledge Graph Learning

Add code
Oct 10, 2024
Figure 1 for Generalization from Starvation: Hints of Universality in LLM Knowledge Graph Learning
Figure 2 for Generalization from Starvation: Hints of Universality in LLM Knowledge Graph Learning
Figure 3 for Generalization from Starvation: Hints of Universality in LLM Knowledge Graph Learning
Figure 4 for Generalization from Starvation: Hints of Universality in LLM Knowledge Graph Learning
Viaarxiv icon

GenEFT: Understanding Statics and Dynamics of Model Generalization via Effective Theory

Add code
Feb 08, 2024
Viaarxiv icon