Picture for Jason Weston

Jason Weston

Google

SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks

Add code
Mar 19, 2025
Viaarxiv icon

LLM Pretraining with Continuous Concepts

Add code
Feb 12, 2025
Viaarxiv icon

Diverse Preference Optimization

Add code
Jan 31, 2025
Figure 1 for Diverse Preference Optimization
Figure 2 for Diverse Preference Optimization
Figure 3 for Diverse Preference Optimization
Figure 4 for Diverse Preference Optimization
Viaarxiv icon

Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge

Add code
Jan 30, 2025
Figure 1 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Figure 2 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Figure 3 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Figure 4 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Viaarxiv icon

R.I.P.: Better Models by Survival of the Fittest Prompts

Add code
Jan 30, 2025
Figure 1 for R.I.P.: Better Models by Survival of the Fittest Prompts
Figure 2 for R.I.P.: Better Models by Survival of the Fittest Prompts
Figure 3 for R.I.P.: Better Models by Survival of the Fittest Prompts
Figure 4 for R.I.P.: Better Models by Survival of the Fittest Prompts
Viaarxiv icon

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Add code
Jan 18, 2025
Viaarxiv icon

Byte Latent Transformer: Patches Scale Better Than Tokens

Add code
Dec 13, 2024
Viaarxiv icon

Training Large Language Models to Reason in a Continuous Latent Space

Add code
Dec 09, 2024
Figure 1 for Training Large Language Models to Reason in a Continuous Latent Space
Figure 2 for Training Large Language Models to Reason in a Continuous Latent Space
Figure 3 for Training Large Language Models to Reason in a Continuous Latent Space
Figure 4 for Training Large Language Models to Reason in a Continuous Latent Space
Viaarxiv icon

ALMA: Alignment with Minimal Annotation

Add code
Dec 05, 2024
Figure 1 for ALMA: Alignment with Minimal Annotation
Figure 2 for ALMA: Alignment with Minimal Annotation
Figure 3 for ALMA: Alignment with Minimal Annotation
Figure 4 for ALMA: Alignment with Minimal Annotation
Viaarxiv icon

Adaptive Decoding via Latent Preference Optimization

Add code
Nov 14, 2024
Figure 1 for Adaptive Decoding via Latent Preference Optimization
Figure 2 for Adaptive Decoding via Latent Preference Optimization
Figure 3 for Adaptive Decoding via Latent Preference Optimization
Figure 4 for Adaptive Decoding via Latent Preference Optimization
Viaarxiv icon