Picture for Katherine Metcalf

Katherine Metcalf

Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models

Add code
Feb 21, 2025
Viaarxiv icon

Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans

Add code
Feb 20, 2025
Viaarxiv icon

On the Way to LLM Personalization: Learning to Remember User Conversations

Add code
Nov 20, 2024
Viaarxiv icon

PREDICT: Preference Reasoning by Evaluating Decomposed preferences Inferred from Candidate Trajectories

Add code
Oct 08, 2024
Viaarxiv icon

On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization

Add code
Sep 05, 2024
Figure 1 for On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization
Figure 2 for On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization
Figure 3 for On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization
Figure 4 for On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization
Viaarxiv icon

Hindsight PRIORs for Reward Learning from Human Preferences

Add code
Apr 12, 2024
Viaarxiv icon

Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards

Add code
Feb 28, 2024
Viaarxiv icon

Large Language Models as Generalizable Policies for Embodied Tasks

Add code
Oct 26, 2023
Figure 1 for Large Language Models as Generalizable Policies for Embodied Tasks
Figure 2 for Large Language Models as Generalizable Policies for Embodied Tasks
Figure 3 for Large Language Models as Generalizable Policies for Embodied Tasks
Figure 4 for Large Language Models as Generalizable Policies for Embodied Tasks
Viaarxiv icon

Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning

Add code
Nov 12, 2022
Figure 1 for Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning
Figure 2 for Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning
Figure 3 for Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning
Figure 4 for Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning
Viaarxiv icon

Symbol Guided Hindsight Priors for Reward Learning from Human Preferences

Add code
Oct 19, 2022
Figure 1 for Symbol Guided Hindsight Priors for Reward Learning from Human Preferences
Figure 2 for Symbol Guided Hindsight Priors for Reward Learning from Human Preferences
Viaarxiv icon