Picture for Wenpin Tang

Wenpin Tang

Regret of exploratory policy improvement and $q$-learning

Add code
Nov 02, 2024
Viaarxiv icon

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Add code
Oct 05, 2024
Viaarxiv icon

Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

Add code
Sep 17, 2024
Viaarxiv icon

Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning

Add code
Sep 12, 2024
Viaarxiv icon

Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions

Add code
May 23, 2024
Viaarxiv icon

Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond

Add code
Mar 12, 2024
Viaarxiv icon

Score-based Diffusion Models via Stochastic Differential Equations -- a Technical Tutorial

Add code
Feb 12, 2024
Viaarxiv icon

Contractive Diffusion Probabilistic Models

Add code
Jan 23, 2024
Viaarxiv icon

Policy Optimization for Continuous Reinforcement Learning

Add code
Jun 02, 2023
Viaarxiv icon

Simulated annealing from continuum to discretization: a convergence analysis via the Eyring--Kramers law

Add code
Feb 09, 2021
Figure 1 for Simulated annealing from continuum to discretization: a convergence analysis via the Eyring--Kramers law
Viaarxiv icon