Picture for Jacob Pfau

Jacob Pfau

Taking AI Welfare Seriously

Add code
Nov 04, 2024
Viaarxiv icon

Steering Without Side Effects: Improving Post-Deployment Control of Language Models

Add code
Jun 21, 2024
Viaarxiv icon

Let's Think Dot by Dot: Hidden Computation in Transformer Language Models

Add code
Apr 24, 2024
Viaarxiv icon

Self-Consistency of Large Language Models under Ambiguity

Add code
Oct 20, 2023
Viaarxiv icon

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Add code
Jul 27, 2023
Figure 1 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 2 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 3 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 4 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Viaarxiv icon

Objective Robustness in Deep Reinforcement Learning

Add code
Jun 08, 2021
Figure 1 for Objective Robustness in Deep Reinforcement Learning
Figure 2 for Objective Robustness in Deep Reinforcement Learning
Figure 3 for Objective Robustness in Deep Reinforcement Learning
Figure 4 for Objective Robustness in Deep Reinforcement Learning
Viaarxiv icon

Robust Semantic Interpretability: Revisiting Concept Activation Vectors

Add code
Apr 06, 2021
Figure 1 for Robust Semantic Interpretability: Revisiting Concept Activation Vectors
Figure 2 for Robust Semantic Interpretability: Revisiting Concept Activation Vectors
Figure 3 for Robust Semantic Interpretability: Revisiting Concept Activation Vectors
Figure 4 for Robust Semantic Interpretability: Revisiting Concept Activation Vectors
Viaarxiv icon

Global Saliency: Aggregating Saliency Maps to Assess Dataset Artefact Bias

Add code
Oct 16, 2019
Figure 1 for Global Saliency: Aggregating Saliency Maps to Assess Dataset Artefact Bias
Figure 2 for Global Saliency: Aggregating Saliency Maps to Assess Dataset Artefact Bias
Figure 3 for Global Saliency: Aggregating Saliency Maps to Assess Dataset Artefact Bias
Figure 4 for Global Saliency: Aggregating Saliency Maps to Assess Dataset Artefact Bias
Viaarxiv icon