Picture for Jacob Steinhardt

Jacob Steinhardt

What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?

Add code
Nov 12, 2024
Viaarxiv icon

VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models

Add code
Oct 10, 2024
Viaarxiv icon

Explaining Datasets in Words: Statistical Models with Natural Language Parameters

Add code
Sep 13, 2024
Figure 1 for Explaining Datasets in Words: Statistical Models with Natural Language Parameters
Figure 2 for Explaining Datasets in Words: Statistical Models with Natural Language Parameters
Figure 3 for Explaining Datasets in Words: Statistical Models with Natural Language Parameters
Figure 4 for Explaining Datasets in Words: Statistical Models with Natural Language Parameters
Viaarxiv icon

Safety vs. Performance: How Multi-Objective Learning Reduces Barriers to Market Entry

Add code
Sep 05, 2024
Viaarxiv icon

Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation

Add code
Jun 28, 2024
Figure 1 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Figure 2 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Figure 3 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Figure 4 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Viaarxiv icon

Monitoring Latent World States in Language Models with Propositional Probes

Add code
Jun 27, 2024
Viaarxiv icon

Adversaries Can Misuse Combinations of Safe Models

Add code
Jun 20, 2024
Viaarxiv icon

Interpreting the Second-Order Effects of Neurons in CLIP

Add code
Jun 06, 2024
Viaarxiv icon

Approaching Human-Level Forecasting with Language Models

Add code
Feb 28, 2024
Viaarxiv icon

Feedback Loops With Language Models Drive In-Context Reward Hacking

Add code
Feb 09, 2024
Viaarxiv icon