Picture for Esin Durmus

Esin Durmus

Sabotage Evaluations for Frontier Models

Add code
Oct 28, 2024
Figure 1 for Sabotage Evaluations for Frontier Models
Figure 2 for Sabotage Evaluations for Frontier Models
Figure 3 for Sabotage Evaluations for Frontier Models
Figure 4 for Sabotage Evaluations for Frontier Models
Viaarxiv icon

Collective Constitutional AI: Aligning a Language Model with Public Input

Add code
Jun 12, 2024
Viaarxiv icon

NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps

Add code
Apr 02, 2024
Figure 1 for NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps
Figure 2 for NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps
Figure 3 for NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps
Figure 4 for NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps
Viaarxiv icon

Evaluating and Mitigating Discrimination in Language Model Decisions

Add code
Dec 06, 2023
Viaarxiv icon

Towards Understanding Sycophancy in Language Models

Add code
Oct 27, 2023
Viaarxiv icon

Specific versus General Principles for Constitutional AI

Add code
Oct 20, 2023
Figure 1 for Specific versus General Principles for Constitutional AI
Figure 2 for Specific versus General Principles for Constitutional AI
Figure 3 for Specific versus General Principles for Constitutional AI
Figure 4 for Specific versus General Principles for Constitutional AI
Viaarxiv icon

Studying Large Language Model Generalization with Influence Functions

Add code
Aug 07, 2023
Figure 1 for Studying Large Language Model Generalization with Influence Functions
Figure 2 for Studying Large Language Model Generalization with Influence Functions
Figure 3 for Studying Large Language Model Generalization with Influence Functions
Figure 4 for Studying Large Language Model Generalization with Influence Functions
Viaarxiv icon

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

Add code
Jul 25, 2023
Viaarxiv icon

Measuring Faithfulness in Chain-of-Thought Reasoning

Add code
Jul 17, 2023
Figure 1 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 2 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 3 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 4 for Measuring Faithfulness in Chain-of-Thought Reasoning
Viaarxiv icon

Towards Measuring the Representation of Subjective Global Opinions in Language Models

Add code
Jun 28, 2023
Figure 1 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Figure 2 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Figure 3 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Figure 4 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Viaarxiv icon