Picture for Tengyu Xu

Tengyu Xu

Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following

Add code
Oct 21, 2024
Viaarxiv icon

The Perfect Blend: Redefining RLHF with Mixture of Judges

Add code
Sep 30, 2024
Figure 1 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Figure 2 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Figure 3 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Figure 4 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Viaarxiv icon

Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward

Add code
Jun 13, 2022
Viaarxiv icon

Model-Based Offline Meta-Reinforcement Learning with Regularization

Add code
Feb 07, 2022
Figure 1 for Model-Based Offline Meta-Reinforcement Learning with Regularization
Figure 2 for Model-Based Offline Meta-Reinforcement Learning with Regularization
Figure 3 for Model-Based Offline Meta-Reinforcement Learning with Regularization
Figure 4 for Model-Based Offline Meta-Reinforcement Learning with Regularization
Viaarxiv icon

Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process

Add code
Oct 20, 2021
Figure 1 for Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process
Viaarxiv icon

PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method

Add code
Oct 13, 2021
Figure 1 for PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method
Figure 2 for PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method
Figure 3 for PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method
Figure 4 for PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method
Viaarxiv icon

A Unified Off-Policy Evaluation Approach for General Value Function

Add code
Jul 06, 2021
Figure 1 for A Unified Off-Policy Evaluation Approach for General Value Function
Figure 2 for A Unified Off-Policy Evaluation Approach for General Value Function
Viaarxiv icon

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Add code
Feb 27, 2021
Figure 1 for Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Figure 2 for Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Figure 3 for Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Viaarxiv icon

Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry

Add code
Feb 17, 2021
Figure 1 for Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry
Viaarxiv icon

A Primal Approach to Constrained Policy Optimization: Global Optimality and Finite-Time Analysis

Add code
Nov 17, 2020
Figure 1 for A Primal Approach to Constrained Policy Optimization: Global Optimality and Finite-Time Analysis
Viaarxiv icon