Picture for Tobias Gerstenberg

Tobias Gerstenberg

Imagining and building wise machines: The centrality of AI metacognition

Add code
Nov 04, 2024
Viaarxiv icon

MARPLE: A Benchmark for Long-Horizon Inference

Add code
Oct 02, 2024
Viaarxiv icon

Human-like Affective Cognition in Foundation Models

Add code
Sep 19, 2024
Viaarxiv icon

To Err is Robotic: Rapid Value-Based Trial-and-Error during Deployment

Add code
Jun 22, 2024
Viaarxiv icon

Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels

Add code
Apr 22, 2024
Figure 1 for Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels
Figure 2 for Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels
Figure 3 for Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels
Figure 4 for Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels
Viaarxiv icon

Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language Models

Add code
Apr 17, 2024
Viaarxiv icon

STaR-GATE: Teaching Language Models to Ask Clarifying Questions

Add code
Mar 29, 2024
Viaarxiv icon

MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks

Add code
Oct 31, 2023
Viaarxiv icon

Social Contract AI: Aligning AI Assistants with Implicit Group Norms

Add code
Oct 26, 2023
Viaarxiv icon

Understanding Social Reasoning in Language Models with Language Models

Add code
Jun 21, 2023
Viaarxiv icon