Picture for Kianté Brantley

Kianté Brantley

Diffusing States and Matching Scores: A New Framework for Imitation Learning

Add code
Oct 17, 2024
Viaarxiv icon

LLMs Are In-Context Reinforcement Learners

Add code
Oct 07, 2024
Figure 1 for LLMs Are In-Context Reinforcement Learners
Figure 2 for LLMs Are In-Context Reinforcement Learners
Figure 3 for LLMs Are In-Context Reinforcement Learners
Figure 4 for LLMs Are In-Context Reinforcement Learners
Viaarxiv icon

Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

Add code
Oct 06, 2024
Figure 1 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Figure 2 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Figure 3 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Figure 4 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Viaarxiv icon

REBEL: Reinforcement Learning via Regressing Relative Rewards

Add code
Apr 25, 2024
Viaarxiv icon

Dataset Reset Policy Optimization for RLHF

Add code
Apr 15, 2024
Figure 1 for Dataset Reset Policy Optimization for RLHF
Figure 2 for Dataset Reset Policy Optimization for RLHF
Figure 3 for Dataset Reset Policy Optimization for RLHF
Figure 4 for Dataset Reset Policy Optimization for RLHF
Viaarxiv icon

Adversarial Imitation Learning via Boosting

Add code
Apr 12, 2024
Figure 1 for Adversarial Imitation Learning via Boosting
Figure 2 for Adversarial Imitation Learning via Boosting
Figure 3 for Adversarial Imitation Learning via Boosting
Figure 4 for Adversarial Imitation Learning via Boosting
Viaarxiv icon

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Add code
Mar 25, 2024
Figure 1 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Figure 2 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Figure 3 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Figure 4 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Viaarxiv icon

A Surprising Failure? Multimodal LLMs and the NLVR Challenge

Add code
Feb 26, 2024
Viaarxiv icon

Reviewer2: Optimizing Review Generation Through Prompt Generation

Add code
Feb 16, 2024
Viaarxiv icon

Policy-Gradient Training of Language Models for Ranking

Add code
Oct 06, 2023
Viaarxiv icon