Picture for Shentao Yang

Shentao Yang

SWaT: Statistical Modeling of Video Watch Time through User Behavior Analysis

Add code
Aug 14, 2024
Viaarxiv icon

Sequential Decision-Making for Inline Text Autocomplete

Add code
Mar 21, 2024
Viaarxiv icon

A Dense Reward View on Aligning Text-to-Image Diffusion with Preference

Add code
Feb 13, 2024
Viaarxiv icon

Preference-grounded Token-level Guidance for Language Model Fine-tuning

Add code
Jun 01, 2023
Viaarxiv icon

Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-oriented Dialogue Systems

Add code
Feb 20, 2023
Viaarxiv icon

A Unified Framework for Alternating Offline Model Training and Policy Learning

Add code
Oct 12, 2022
Figure 1 for A Unified Framework for Alternating Offline Model Training and Policy Learning
Figure 2 for A Unified Framework for Alternating Offline Model Training and Policy Learning
Figure 3 for A Unified Framework for Alternating Offline Model Training and Policy Learning
Figure 4 for A Unified Framework for Alternating Offline Model Training and Policy Learning
Viaarxiv icon

Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning

Add code
Jun 14, 2022
Figure 1 for Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning
Figure 2 for Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning
Figure 3 for Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning
Figure 4 for Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning
Viaarxiv icon

A Regularized Implicit Policy for Offline Reinforcement Learning

Add code
Feb 19, 2022
Figure 1 for A Regularized Implicit Policy for Offline Reinforcement Learning
Figure 2 for A Regularized Implicit Policy for Offline Reinforcement Learning
Figure 3 for A Regularized Implicit Policy for Offline Reinforcement Learning
Figure 4 for A Regularized Implicit Policy for Offline Reinforcement Learning
Viaarxiv icon