Picture for Weixun Wang

Weixun Wang

Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models

Add code
Nov 13, 2024
Figure 1 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 2 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 3 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 4 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Viaarxiv icon

2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision

Add code
Oct 25, 2024
Figure 1 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 2 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 3 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 4 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Viaarxiv icon

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Add code
May 20, 2024
Viaarxiv icon

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Add code
Mar 24, 2024
Figure 1 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 2 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 3 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 4 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Viaarxiv icon

Off-Beat Multi-Agent Reinforcement Learning

Add code
May 27, 2022
Figure 1 for Off-Beat Multi-Agent Reinforcement Learning
Figure 2 for Off-Beat Multi-Agent Reinforcement Learning
Figure 3 for Off-Beat Multi-Agent Reinforcement Learning
Figure 4 for Off-Beat Multi-Agent Reinforcement Learning
Viaarxiv icon

A2C is a special case of PPO

Add code
May 18, 2022
Figure 1 for A2C is a special case of PPO
Viaarxiv icon

Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents

Add code
Mar 16, 2022
Figure 1 for Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents
Figure 2 for Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents
Figure 3 for Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents
Figure 4 for Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents
Viaarxiv icon

API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks

Add code
Mar 10, 2022
Figure 1 for API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks
Figure 2 for API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks
Figure 3 for API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks
Figure 4 for API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks
Viaarxiv icon

Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization

Add code
Feb 16, 2022
Figure 1 for Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization
Figure 2 for Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization
Figure 3 for Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization
Figure 4 for Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization
Viaarxiv icon

Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment

Add code
Jun 03, 2021
Figure 1 for Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment
Figure 2 for Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment
Figure 3 for Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment
Figure 4 for Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment
Viaarxiv icon