Picture for Yuxin Chen

Yuxin Chen

Residual Policy Gradient: A Reward View of KL-regularized Objective

Add code
Mar 14, 2025
Viaarxiv icon

External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation

Add code
Feb 26, 2025
Figure 1 for External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation
Figure 2 for External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation
Figure 3 for External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation
Figure 4 for External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation
Viaarxiv icon

Physics-Aware Robotic Palletization with Online Masking Inference

Add code
Feb 19, 2025
Viaarxiv icon

MixDec Sampling: A Soft Link-based Sampling Method of Graph Neural Network for Recommendation

Add code
Feb 12, 2025
Viaarxiv icon

DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization

Add code
Feb 11, 2025
Viaarxiv icon

Active Advantage-Aligned Online Reinforcement Learning with Offline Data

Add code
Feb 11, 2025
Viaarxiv icon

Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization

Add code
Feb 09, 2025
Figure 1 for Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization
Figure 2 for Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization
Figure 3 for Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization
Figure 4 for Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization
Viaarxiv icon

Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment

Add code
Jan 16, 2025
Figure 1 for Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Figure 2 for Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Figure 3 for Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Figure 4 for Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Viaarxiv icon

FDPP: Fine-tune Diffusion Policy with Human Preference

Add code
Jan 14, 2025
Figure 1 for FDPP: Fine-tune Diffusion Policy with Human Preference
Figure 2 for FDPP: Fine-tune Diffusion Policy with Human Preference
Figure 3 for FDPP: Fine-tune Diffusion Policy with Human Preference
Figure 4 for FDPP: Fine-tune Diffusion Policy with Human Preference
Viaarxiv icon

Testing Human-Hand Segmentation on In-Distribution and Out-of-Distribution Data in Human-Robot Interactions Using a Deep Ensemble Model

Add code
Jan 13, 2025
Figure 1 for Testing Human-Hand Segmentation on In-Distribution and Out-of-Distribution Data in Human-Robot Interactions Using a Deep Ensemble Model
Figure 2 for Testing Human-Hand Segmentation on In-Distribution and Out-of-Distribution Data in Human-Robot Interactions Using a Deep Ensemble Model
Figure 3 for Testing Human-Hand Segmentation on In-Distribution and Out-of-Distribution Data in Human-Robot Interactions Using a Deep Ensemble Model
Figure 4 for Testing Human-Hand Segmentation on In-Distribution and Out-of-Distribution Data in Human-Robot Interactions Using a Deep Ensemble Model
Viaarxiv icon