Picture for Han Shen

Han Shen

Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning

Add code
Oct 20, 2024
Figure 1 for Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning
Figure 2 for Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning
Figure 3 for Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning
Figure 4 for Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning
Viaarxiv icon

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Add code
Oct 09, 2024
Figure 1 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
Figure 2 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
Figure 3 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
Figure 4 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
Viaarxiv icon

Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF

Add code
Feb 10, 2024
Viaarxiv icon

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

Add code
Jan 13, 2024
Viaarxiv icon

On Penalty-based Bilevel Gradient Descent Method

Add code
Feb 10, 2023
Viaarxiv icon

Alternating Implicit Projected SGD and Its Efficient Variants for Equality-constrained Bilevel Optimization

Add code
Nov 14, 2022
Viaarxiv icon

Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Stochastic Approach

Add code
Oct 23, 2022
Viaarxiv icon

A Single-Timescale Analysis For Stochastic Approximation With Multiple Coupled Sequences

Add code
Jun 21, 2022
Figure 1 for A Single-Timescale Analysis For Stochastic Approximation With Multiple Coupled Sequences
Figure 2 for A Single-Timescale Analysis For Stochastic Approximation With Multiple Coupled Sequences
Viaarxiv icon

Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup

Add code
Dec 31, 2020
Figure 1 for Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup
Figure 2 for Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup
Figure 3 for Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup
Figure 4 for Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup
Viaarxiv icon

Multi-object Tracking via End-to-end Tracklet Searching and Ranking

Add code
Mar 04, 2020
Figure 1 for Multi-object Tracking via End-to-end Tracklet Searching and Ranking
Figure 2 for Multi-object Tracking via End-to-end Tracklet Searching and Ranking
Figure 3 for Multi-object Tracking via End-to-end Tracklet Searching and Ranking
Figure 4 for Multi-object Tracking via End-to-end Tracklet Searching and Ranking
Viaarxiv icon