Picture for Martin Wattenberg

Martin Wattenberg

Relational Composition in Neural Networks: A Survey and Call to Action

Add code
Jul 19, 2024
Viaarxiv icon

Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner

Add code
Jun 17, 2024
Figure 1 for Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Figure 2 for Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Figure 3 for Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Figure 4 for Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Viaarxiv icon

Designing a Dashboard for Transparency and Control of Conversational AI

Add code
Jun 12, 2024
Viaarxiv icon

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

Add code
Feb 22, 2024
Figure 1 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Figure 2 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Figure 3 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Figure 4 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Viaarxiv icon

Measuring and Controlling Persona Drift in Language Model Dialogs

Add code
Feb 13, 2024
Figure 1 for Measuring and Controlling Persona Drift in Language Model Dialogs
Figure 2 for Measuring and Controlling Persona Drift in Language Model Dialogs
Figure 3 for Measuring and Controlling Persona Drift in Language Model Dialogs
Figure 4 for Measuring and Controlling Persona Drift in Language Model Dialogs
Viaarxiv icon

A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

Add code
Jan 03, 2024
Figure 1 for A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Figure 2 for A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Figure 3 for A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Figure 4 for A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Viaarxiv icon

AI Alignment in the Design of Interactive AI: Specification Alignment, Process Alignment, and Evaluation Support

Add code
Oct 23, 2023
Viaarxiv icon

ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Add code
Sep 17, 2023
Viaarxiv icon

Emergent Linear Representations in World Models of Self-Supervised Sequence Models

Add code
Sep 07, 2023
Viaarxiv icon

Linearity of Relation Decoding in Transformer Language Models

Add code
Aug 17, 2023
Viaarxiv icon