Picture for Xiaochen Zuo

Xiaochen Zuo

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Add code
Apr 08, 2025
Viaarxiv icon

A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization

Add code
Apr 07, 2025
Viaarxiv icon

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Add code
Mar 18, 2025
Viaarxiv icon

Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking

Add code
Aug 24, 2021
Figure 1 for Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking
Figure 2 for Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking
Figure 3 for Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking
Figure 4 for Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking
Viaarxiv icon