Picture for Siliang Zeng

Siliang Zeng

Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens

Add code
Oct 18, 2024
Figure 1 for Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Figure 2 for Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Figure 3 for Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Figure 4 for Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Viaarxiv icon

Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback

Add code
Jun 11, 2024
Viaarxiv icon

Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment

Add code
May 29, 2024
Figure 1 for Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Figure 2 for Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Figure 3 for Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Figure 4 for Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Viaarxiv icon

A Bayesian Approach to Robust Inverse Reinforcement Learning

Add code
Sep 15, 2023
Viaarxiv icon

Understanding Expertise through Demonstrations: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning

Add code
Feb 15, 2023
Viaarxiv icon

Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees

Add code
Oct 04, 2022
Figure 1 for Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees
Figure 2 for Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees
Figure 3 for Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees
Viaarxiv icon

Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees

Add code
Oct 04, 2022
Figure 1 for Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees
Figure 2 for Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees
Figure 3 for Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees
Figure 4 for Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees
Viaarxiv icon

Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees

Add code
Oct 11, 2021
Figure 1 for Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees
Figure 2 for Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees
Figure 3 for Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees
Viaarxiv icon

A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization

Add code
Feb 15, 2021
Figure 1 for A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization
Figure 2 for A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization
Figure 3 for A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization
Figure 4 for A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization
Viaarxiv icon

On the Divergence of Decentralized Non-Convex Optimization

Add code
Jun 20, 2020
Figure 1 for On the Divergence of Decentralized Non-Convex Optimization
Figure 2 for On the Divergence of Decentralized Non-Convex Optimization
Figure 3 for On the Divergence of Decentralized Non-Convex Optimization
Figure 4 for On the Divergence of Decentralized Non-Convex Optimization
Viaarxiv icon