Picture for Changyu Chen

Changyu Chen

Sample-Efficient Alignment for LLMs

Add code
Nov 03, 2024
Viaarxiv icon

Towards Neural Network based Cognitive Models of Dynamic Decision-Making by Humans

Add code
Jul 24, 2024
Viaarxiv icon

Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning

Add code
Jun 15, 2024
Viaarxiv icon

Bootstrapping Language Models with DPO Implicit Rewards

Add code
Jun 14, 2024
Viaarxiv icon

Prototypical Reward Network for Data-Efficient RLHF

Add code
Jun 06, 2024
Viaarxiv icon

Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models

Add code
Mar 04, 2024
Viaarxiv icon

Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use

Add code
Dec 07, 2023
Figure 1 for Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use
Figure 2 for Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use
Figure 3 for Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use
Figure 4 for Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use
Viaarxiv icon

Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning

Add code
Nov 26, 2023
Viaarxiv icon

CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment

Add code
Oct 25, 2023
Figure 1 for CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment
Figure 2 for CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment
Figure 3 for CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment
Figure 4 for CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment
Viaarxiv icon

Semi-Offline Reinforcement Learning for Optimized Text Generation

Add code
Jun 16, 2023
Viaarxiv icon