Picture for Shibo Hao

Shibo Hao

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Add code
Jun 17, 2025
Viaarxiv icon

Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models

Add code
May 19, 2025
Viaarxiv icon

Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

Add code
May 18, 2025
Viaarxiv icon

LLM Pretraining with Continuous Concepts

Add code
Feb 12, 2025
Viaarxiv icon

Linear Correlation in LM's Compositional Generalization and Hallucination

Add code
Feb 06, 2025
Viaarxiv icon

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Add code
Dec 20, 2024
Figure 1 for Offline Reinforcement Learning for LLM Multi-Step Reasoning
Figure 2 for Offline Reinforcement Learning for LLM Multi-Step Reasoning
Figure 3 for Offline Reinforcement Learning for LLM Multi-Step Reasoning
Figure 4 for Offline Reinforcement Learning for LLM Multi-Step Reasoning
Viaarxiv icon

Training Large Language Models to Reason in a Continuous Latent Space

Add code
Dec 09, 2024
Figure 1 for Training Large Language Models to Reason in a Continuous Latent Space
Figure 2 for Training Large Language Models to Reason in a Continuous Latent Space
Figure 3 for Training Large Language Models to Reason in a Continuous Latent Space
Figure 4 for Training Large Language Models to Reason in a Continuous Latent Space
Viaarxiv icon

Pandora: Towards General World Model with Natural Language Actions and Video States

Add code
Jun 12, 2024
Figure 1 for Pandora: Towards General World Model with Natural Language Actions and Video States
Figure 2 for Pandora: Towards General World Model with Natural Language Actions and Video States
Figure 3 for Pandora: Towards General World Model with Natural Language Actions and Video States
Figure 4 for Pandora: Towards General World Model with Natural Language Actions and Video States
Viaarxiv icon

Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking

Add code
Jun 09, 2024
Figure 1 for Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking
Figure 2 for Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking
Figure 3 for Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking
Figure 4 for Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking
Viaarxiv icon

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

Add code
Apr 08, 2024
Viaarxiv icon