Picture for Alex Tamkin

Alex Tamkin

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

Add code
Jun 17, 2024
Viaarxiv icon

Collective Constitutional AI: Aligning a Language Model with Public Input

Add code
Jun 12, 2024
Viaarxiv icon

Bayesian Preference Elicitation with Language Models

Add code
Mar 08, 2024
Viaarxiv icon

Evaluating and Mitigating Discrimination in Language Model Decisions

Add code
Dec 06, 2023
Viaarxiv icon

Codebook Features: Sparse and Discrete Interpretability for Neural Networks

Add code
Oct 26, 2023
Viaarxiv icon

Social Contract AI: Aligning AI Assistants with Implicit Group Norms

Add code
Oct 26, 2023
Viaarxiv icon

Eliciting Human Preferences with Language Models

Add code
Oct 17, 2023
Viaarxiv icon

Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data

Add code
Sep 26, 2023
Viaarxiv icon

Studying Large Language Model Generalization with Influence Functions

Add code
Aug 07, 2023
Figure 1 for Studying Large Language Model Generalization with Influence Functions
Figure 2 for Studying Large Language Model Generalization with Influence Functions
Figure 3 for Studying Large Language Model Generalization with Influence Functions
Figure 4 for Studying Large Language Model Generalization with Influence Functions
Viaarxiv icon

Towards Measuring the Representation of Subjective Global Opinions in Language Models

Add code
Jun 28, 2023
Figure 1 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Figure 2 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Figure 3 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Figure 4 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Viaarxiv icon