Picture for Yinglun Xu

Yinglun Xu

Latent Adversarial Regularization for Offline Preference Optimization

Add code
Jan 29, 2026
Viaarxiv icon

Learning a Pessimistic Reward Model in RLHF

Add code
May 26, 2025
Figure 1 for Learning a Pessimistic Reward Model in RLHF
Figure 2 for Learning a Pessimistic Reward Model in RLHF
Figure 3 for Learning a Pessimistic Reward Model in RLHF
Viaarxiv icon

Improving Assembly Code Performance with Large Language Models via Reinforcement Learning

Add code
May 16, 2025
Figure 1 for Improving Assembly Code Performance with Large Language Models via Reinforcement Learning
Figure 2 for Improving Assembly Code Performance with Large Language Models via Reinforcement Learning
Figure 3 for Improving Assembly Code Performance with Large Language Models via Reinforcement Learning
Figure 4 for Improving Assembly Code Performance with Large Language Models via Reinforcement Learning
Viaarxiv icon

Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks

Add code
Oct 25, 2024
Figure 1 for Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks
Figure 2 for Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks
Figure 3 for Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks
Figure 4 for Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks
Viaarxiv icon

Optimal Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning

Add code
Jun 14, 2024
Figure 1 for Optimal Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning
Figure 2 for Optimal Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning
Figure 3 for Optimal Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning
Figure 4 for Optimal Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning
Viaarxiv icon

Reward Poisoning Attack Against Offline Reinforcement Learning

Add code
Feb 15, 2024
Figure 1 for Reward Poisoning Attack Against Offline Reinforcement Learning
Figure 2 for Reward Poisoning Attack Against Offline Reinforcement Learning
Figure 3 for Reward Poisoning Attack Against Offline Reinforcement Learning
Figure 4 for Reward Poisoning Attack Against Offline Reinforcement Learning
Viaarxiv icon

Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback

Add code
Dec 30, 2023
Figure 1 for Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback
Figure 2 for Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback
Figure 3 for Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback
Figure 4 for Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback
Viaarxiv icon

On the Robustness of Epoch-Greedy in Multi-Agent Contextual Bandit Mechanisms

Add code
Jul 15, 2023
Figure 1 for On the Robustness of Epoch-Greedy in Multi-Agent Contextual Bandit Mechanisms
Figure 2 for On the Robustness of Epoch-Greedy in Multi-Agent Contextual Bandit Mechanisms
Viaarxiv icon

Black-Box Targeted Reward Poisoning Attack Against Online Deep Reinforcement Learning

Add code
May 18, 2023
Figure 1 for Black-Box Targeted Reward Poisoning Attack Against Online Deep Reinforcement Learning
Figure 2 for Black-Box Targeted Reward Poisoning Attack Against Online Deep Reinforcement Learning
Figure 3 for Black-Box Targeted Reward Poisoning Attack Against Online Deep Reinforcement Learning
Figure 4 for Black-Box Targeted Reward Poisoning Attack Against Online Deep Reinforcement Learning
Viaarxiv icon

Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning

Add code
May 30, 2022
Figure 1 for Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning
Figure 2 for Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning
Figure 3 for Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning
Figure 4 for Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning
Viaarxiv icon