Picture for Deep Ganguli

Deep Ganguli

Shammie

Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions

Add code
Apr 21, 2025
Figure 1 for Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions
Figure 2 for Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions
Figure 3 for Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions
Figure 4 for Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions
Viaarxiv icon

Toward an Evaluation Science for Generative AI Systems

Add code
Mar 07, 2025
Viaarxiv icon

Clio: Privacy-Preserving Insights into Real-World AI Use

Add code
Dec 18, 2024
Viaarxiv icon

Sabotage Evaluations for Frontier Models

Add code
Oct 28, 2024
Figure 1 for Sabotage Evaluations for Frontier Models
Figure 2 for Sabotage Evaluations for Frontier Models
Figure 3 for Sabotage Evaluations for Frontier Models
Figure 4 for Sabotage Evaluations for Frontier Models
Viaarxiv icon

Collective Constitutional AI: Aligning a Language Model with Public Input

Add code
Jun 12, 2024
Viaarxiv icon

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Add code
Jan 17, 2024
Viaarxiv icon

Evaluating and Mitigating Discrimination in Language Model Decisions

Add code
Dec 06, 2023
Figure 1 for Evaluating and Mitigating Discrimination in Language Model Decisions
Figure 2 for Evaluating and Mitigating Discrimination in Language Model Decisions
Figure 3 for Evaluating and Mitigating Discrimination in Language Model Decisions
Figure 4 for Evaluating and Mitigating Discrimination in Language Model Decisions
Viaarxiv icon

Towards Measuring the Representation of Subjective Global Opinions in Language Models

Add code
Jun 28, 2023
Figure 1 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Figure 2 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Figure 3 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Figure 4 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Viaarxiv icon

Opportunities and Risks of LLMs for Scalable Deliberation with Polis

Add code
Jun 20, 2023
Figure 1 for Opportunities and Risks of LLMs for Scalable Deliberation with Polis
Figure 2 for Opportunities and Risks of LLMs for Scalable Deliberation with Polis
Figure 3 for Opportunities and Risks of LLMs for Scalable Deliberation with Polis
Figure 4 for Opportunities and Risks of LLMs for Scalable Deliberation with Polis
Viaarxiv icon

The Capacity for Moral Self-Correction in Large Language Models

Add code
Feb 18, 2023
Figure 1 for The Capacity for Moral Self-Correction in Large Language Models
Figure 2 for The Capacity for Moral Self-Correction in Large Language Models
Figure 3 for The Capacity for Moral Self-Correction in Large Language Models
Figure 4 for The Capacity for Moral Self-Correction in Large Language Models
Viaarxiv icon