Picture for Yuchuan Wu

Yuchuan Wu

Reward Modeling from Natural Language Human Feedback

Add code
Jan 12, 2026
Viaarxiv icon

MOA: Multi-Objective Alignment for Role-Playing Agents

Add code
Dec 10, 2025
Viaarxiv icon

CPO: Addressing Reward Ambiguity in Role-playing Dialogue via Comparative Policy Optimization

Add code
Aug 12, 2025
Viaarxiv icon

TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence

Add code
May 30, 2025
Viaarxiv icon

ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents

Add code
May 29, 2025
Figure 1 for ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents
Figure 2 for ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents
Figure 3 for ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents
Figure 4 for ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents
Viaarxiv icon

Reverse Preference Optimization for Complex Instruction Following

Add code
May 28, 2025
Viaarxiv icon

OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction

Add code
May 26, 2025
Figure 1 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Figure 2 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Figure 3 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Figure 4 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Viaarxiv icon

EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning

Add code
Feb 18, 2025
Figure 1 for EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
Figure 2 for EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
Figure 3 for EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
Figure 4 for EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
Viaarxiv icon

OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis

Add code
Jan 08, 2025
Figure 1 for OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis
Figure 2 for OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis
Figure 3 for OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis
Figure 4 for OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis
Viaarxiv icon

SDPO: Segment-Level Direct Preference Optimization for Social Agents

Add code
Jan 03, 2025
Figure 1 for SDPO: Segment-Level Direct Preference Optimization for Social Agents
Figure 2 for SDPO: Segment-Level Direct Preference Optimization for Social Agents
Figure 3 for SDPO: Segment-Level Direct Preference Optimization for Social Agents
Figure 4 for SDPO: Segment-Level Direct Preference Optimization for Social Agents
Viaarxiv icon