Picture for Richard Yuanzhe Pang

Richard Yuanzhe Pang

Transformers Struggle to Learn to Search

Add code
Dec 06, 2024
Figure 1 for Transformers Struggle to Learn to Search
Figure 2 for Transformers Struggle to Learn to Search
Figure 3 for Transformers Struggle to Learn to Search
Figure 4 for Transformers Struggle to Learn to Search
Viaarxiv icon

Self-Generated Critiques Boost Reward Modeling for Language Models

Add code
Nov 25, 2024
Figure 1 for Self-Generated Critiques Boost Reward Modeling for Language Models
Figure 2 for Self-Generated Critiques Boost Reward Modeling for Language Models
Figure 3 for Self-Generated Critiques Boost Reward Modeling for Language Models
Figure 4 for Self-Generated Critiques Boost Reward Modeling for Language Models
Viaarxiv icon

Self-Consistency Preference Optimization

Add code
Nov 06, 2024
Figure 1 for Self-Consistency Preference Optimization
Figure 2 for Self-Consistency Preference Optimization
Figure 3 for Self-Consistency Preference Optimization
Figure 4 for Self-Consistency Preference Optimization
Viaarxiv icon

Self-Taught Evaluators

Add code
Aug 05, 2024
Figure 1 for Self-Taught Evaluators
Figure 2 for Self-Taught Evaluators
Figure 3 for Self-Taught Evaluators
Figure 4 for Self-Taught Evaluators
Viaarxiv icon

An Introduction to Vision-Language Modeling

Add code
May 27, 2024
Figure 1 for An Introduction to Vision-Language Modeling
Figure 2 for An Introduction to Vision-Language Modeling
Figure 3 for An Introduction to Vision-Language Modeling
Viaarxiv icon

Iterative Reasoning Preference Optimization

Add code
Apr 30, 2024
Figure 1 for Iterative Reasoning Preference Optimization
Figure 2 for Iterative Reasoning Preference Optimization
Figure 3 for Iterative Reasoning Preference Optimization
Figure 4 for Iterative Reasoning Preference Optimization
Viaarxiv icon

Self-Rewarding Language Models

Add code
Jan 18, 2024
Figure 1 for Self-Rewarding Language Models
Figure 2 for Self-Rewarding Language Models
Figure 3 for Self-Rewarding Language Models
Figure 4 for Self-Rewarding Language Models
Viaarxiv icon

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Add code
Nov 20, 2023
Viaarxiv icon

Leveraging Implicit Feedback from Deployment Data in Dialogue

Add code
Jul 26, 2023
Figure 1 for Leveraging Implicit Feedback from Deployment Data in Dialogue
Figure 2 for Leveraging Implicit Feedback from Deployment Data in Dialogue
Figure 3 for Leveraging Implicit Feedback from Deployment Data in Dialogue
Figure 4 for Leveraging Implicit Feedback from Deployment Data in Dialogue
Viaarxiv icon

Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples

Add code
May 24, 2023
Figure 1 for Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
Figure 2 for Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
Figure 3 for Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
Figure 4 for Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
Viaarxiv icon