Picture for Changyu Chen

Changyu Chen

Understanding R1-Zero-Like Training: A Critical Perspective

Add code
Mar 26, 2025
Viaarxiv icon

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Add code
Feb 18, 2025
Viaarxiv icon

On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression

Add code
Jan 16, 2025
Figure 1 for On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression
Figure 2 for On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression
Figure 3 for On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression
Figure 4 for On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression
Viaarxiv icon

Sample-Efficient Alignment for LLMs

Add code
Nov 03, 2024
Figure 1 for Sample-Efficient Alignment for LLMs
Figure 2 for Sample-Efficient Alignment for LLMs
Figure 3 for Sample-Efficient Alignment for LLMs
Figure 4 for Sample-Efficient Alignment for LLMs
Viaarxiv icon

Towards Neural Network based Cognitive Models of Dynamic Decision-Making by Humans

Add code
Jul 24, 2024
Figure 1 for Towards Neural Network based Cognitive Models of Dynamic Decision-Making by Humans
Figure 2 for Towards Neural Network based Cognitive Models of Dynamic Decision-Making by Humans
Figure 3 for Towards Neural Network based Cognitive Models of Dynamic Decision-Making by Humans
Figure 4 for Towards Neural Network based Cognitive Models of Dynamic Decision-Making by Humans
Viaarxiv icon

Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning

Add code
Jun 15, 2024
Viaarxiv icon

Bootstrapping Language Models with DPO Implicit Rewards

Add code
Jun 14, 2024
Figure 1 for Bootstrapping Language Models with DPO Implicit Rewards
Figure 2 for Bootstrapping Language Models with DPO Implicit Rewards
Figure 3 for Bootstrapping Language Models with DPO Implicit Rewards
Figure 4 for Bootstrapping Language Models with DPO Implicit Rewards
Viaarxiv icon

Prototypical Reward Network for Data-Efficient RLHF

Add code
Jun 06, 2024
Viaarxiv icon

Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models

Add code
Mar 04, 2024
Viaarxiv icon

Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use

Add code
Dec 07, 2023
Figure 1 for Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use
Figure 2 for Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use
Figure 3 for Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use
Figure 4 for Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use
Viaarxiv icon