Picture for Jacob Eisenstein

Jacob Eisenstein

ALTA: Compiler-Based Analysis of Transformers

Add code
Oct 23, 2024
Figure 1 for ALTA: Compiler-Based Analysis of Transformers
Figure 2 for ALTA: Compiler-Based Analysis of Transformers
Figure 3 for ALTA: Compiler-Based Analysis of Transformers
Figure 4 for ALTA: Compiler-Based Analysis of Transformers
Viaarxiv icon

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

Add code
Oct 10, 2024
Figure 1 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Figure 2 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Figure 3 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Figure 4 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Viaarxiv icon

Predicting the Target Word of Game-playing Conversations using a Low-Rank Dialect Adapter for Decoder Models

Add code
Aug 31, 2024
Viaarxiv icon

Robust Preference Optimization through Reward Model Distillation

Add code
May 29, 2024
Viaarxiv icon

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

Add code
Apr 18, 2024
Figure 1 for Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Figure 2 for Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Figure 3 for Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Figure 4 for Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Viaarxiv icon

Transforming and Combining Rewards for Aligning Large Language Models

Add code
Feb 01, 2024
Viaarxiv icon

Theoretical guarantees on the best-of-n alignment policy

Add code
Jan 03, 2024
Viaarxiv icon

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

Add code
Dec 21, 2023
Figure 1 for Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
Figure 2 for Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
Figure 3 for Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
Figure 4 for Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
Viaarxiv icon

Selectively Answering Ambiguous Questions

Add code
May 24, 2023
Viaarxiv icon

MD3: The Multi-Dialect Dataset of Dialogues

Add code
May 19, 2023
Viaarxiv icon