Picture for Weixun Wang

Weixun Wang

Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation

Add code
Mar 20, 2025
Viaarxiv icon

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

Add code
Feb 26, 2025
Viaarxiv icon

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

Add code
Feb 23, 2025
Viaarxiv icon

ProgCo: Program Helps Self-Correction of Large Language Models

Add code
Jan 02, 2025
Viaarxiv icon

Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models

Add code
Nov 13, 2024
Figure 1 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 2 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 3 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 4 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Viaarxiv icon

2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision

Add code
Oct 25, 2024
Figure 1 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 2 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 3 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 4 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Viaarxiv icon

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Add code
May 20, 2024
Figure 1 for OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Figure 2 for OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Figure 3 for OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Figure 4 for OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Viaarxiv icon

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Add code
Mar 24, 2024
Figure 1 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 2 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 3 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 4 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Viaarxiv icon

Off-Beat Multi-Agent Reinforcement Learning

Add code
May 27, 2022
Figure 1 for Off-Beat Multi-Agent Reinforcement Learning
Figure 2 for Off-Beat Multi-Agent Reinforcement Learning
Figure 3 for Off-Beat Multi-Agent Reinforcement Learning
Figure 4 for Off-Beat Multi-Agent Reinforcement Learning
Viaarxiv icon

A2C is a special case of PPO

Add code
May 18, 2022
Figure 1 for A2C is a special case of PPO
Viaarxiv icon