Picture for Kianté Brantley

Kianté Brantley

Diffusing States and Matching Scores: A New Framework for Imitation Learning

Add code
Oct 17, 2024
Viaarxiv icon

LLMs Are In-Context Reinforcement Learners

Add code
Oct 07, 2024
Viaarxiv icon

Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

Add code
Oct 06, 2024
Figure 1 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Figure 2 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Figure 3 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Figure 4 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Viaarxiv icon

REBEL: Reinforcement Learning via Regressing Relative Rewards

Add code
Apr 25, 2024
Viaarxiv icon

Dataset Reset Policy Optimization for RLHF

Add code
Apr 15, 2024
Viaarxiv icon

Adversarial Imitation Learning via Boosting

Add code
Apr 12, 2024
Figure 1 for Adversarial Imitation Learning via Boosting
Figure 2 for Adversarial Imitation Learning via Boosting
Figure 3 for Adversarial Imitation Learning via Boosting
Figure 4 for Adversarial Imitation Learning via Boosting
Viaarxiv icon

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Add code
Mar 25, 2024
Figure 1 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Figure 2 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Figure 3 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Figure 4 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Viaarxiv icon

A Surprising Failure? Multimodal LLMs and the NLVR Challenge

Add code
Feb 26, 2024
Viaarxiv icon

Reviewer2: Optimizing Review Generation Through Prompt Generation

Add code
Feb 16, 2024
Viaarxiv icon

Policy-Gradient Training of Language Models for Ranking

Add code
Oct 06, 2023
Viaarxiv icon