Picture for Zhaopeng Tu

Zhaopeng Tu

Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains

Add code
Apr 01, 2025
Viaarxiv icon

Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique

Add code
Mar 21, 2025
Viaarxiv icon

The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement

Add code
Mar 20, 2025
Viaarxiv icon

RaSA: Rank-Sharing Low-Rank Adaptation

Add code
Mar 16, 2025
Viaarxiv icon

The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models

Add code
Mar 04, 2025
Viaarxiv icon

VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models

Add code
Feb 23, 2025
Viaarxiv icon

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Add code
Jan 30, 2025
Figure 1 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Figure 2 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Figure 3 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Figure 4 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Viaarxiv icon

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Add code
Dec 30, 2024
Viaarxiv icon

Teaching LLMs to Refine with Tools

Add code
Dec 22, 2024
Figure 1 for Teaching LLMs to Refine with Tools
Figure 2 for Teaching LLMs to Refine with Tools
Figure 3 for Teaching LLMs to Refine with Tools
Figure 4 for Teaching LLMs to Refine with Tools
Viaarxiv icon

Findings of the WMT 2024 Shared Task on Discourse-Level Literary Translation

Add code
Dec 16, 2024
Viaarxiv icon