Picture for Zhaopeng Tu

Zhaopeng Tu

BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs

Add code
Sep 30, 2025
Viaarxiv icon

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

Add code
Sep 11, 2025
Viaarxiv icon

RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents

Add code
Jul 30, 2025
Figure 1 for RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
Figure 2 for RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
Figure 3 for RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
Figure 4 for RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
Viaarxiv icon

CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards

Add code
Jul 23, 2025
Viaarxiv icon

DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

Add code
May 29, 2025
Viaarxiv icon

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

Add code
May 20, 2025
Viaarxiv icon

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

Add code
May 19, 2025
Viaarxiv icon

VISTA: Enhancing Vision-Text Alignment in MLLMs via Cross-Modal Mutual Information Maximization

Add code
May 19, 2025
Viaarxiv icon

Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models

Add code
May 01, 2025
Figure 1 for Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Figure 2 for Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Figure 3 for Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Figure 4 for Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Viaarxiv icon

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

Add code
Apr 27, 2025
Viaarxiv icon