Picture for Siliang Zeng

Siliang Zeng

Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens

Add code
Oct 18, 2024
Viaarxiv icon

Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback

Add code
Jun 11, 2024
Viaarxiv icon

Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment

Add code
May 29, 2024
Figure 1 for Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Figure 2 for Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Figure 3 for Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Figure 4 for Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Viaarxiv icon

A Bayesian Approach to Robust Inverse Reinforcement Learning

Add code
Sep 15, 2023
Viaarxiv icon

Understanding Expertise through Demonstrations: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning

Add code
Feb 15, 2023
Viaarxiv icon

Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees

Add code
Oct 04, 2022
Figure 1 for Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees
Figure 2 for Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees
Figure 3 for Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees
Figure 4 for Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees
Viaarxiv icon

Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees

Add code
Oct 04, 2022
Figure 1 for Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees
Figure 2 for Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees
Figure 3 for Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees
Viaarxiv icon

Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees

Add code
Oct 11, 2021
Figure 1 for Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees
Figure 2 for Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees
Figure 3 for Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees
Viaarxiv icon

A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization

Add code
Feb 15, 2021
Figure 1 for A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization
Figure 2 for A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization
Figure 3 for A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization
Figure 4 for A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization
Viaarxiv icon

On the Divergence of Decentralized Non-Convex Optimization

Add code
Jun 20, 2020
Figure 1 for On the Divergence of Decentralized Non-Convex Optimization
Figure 2 for On the Divergence of Decentralized Non-Convex Optimization
Figure 3 for On the Divergence of Decentralized Non-Convex Optimization
Figure 4 for On the Divergence of Decentralized Non-Convex Optimization
Viaarxiv icon