Picture for Mingxuan Wang

Mingxuan Wang

ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Add code
Aug 26, 2025
Figure 1 for ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
Figure 2 for ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
Figure 3 for ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
Figure 4 for ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
Viaarxiv icon

ShortListing Model: A Streamlined SimplexDiffusion for Discrete Variable Generation

Add code
Aug 24, 2025
Viaarxiv icon

Scaling Linear Attention with Sparse State Expansion

Add code
Jul 22, 2025
Viaarxiv icon

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

Add code
Jul 03, 2025
Viaarxiv icon

Truncated Proximal Policy Optimization

Add code
Jun 18, 2025
Figure 1 for Truncated Proximal Policy Optimization
Figure 2 for Truncated Proximal Policy Optimization
Figure 3 for Truncated Proximal Policy Optimization
Figure 4 for Truncated Proximal Policy Optimization
Viaarxiv icon

Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

Add code
May 26, 2025
Figure 1 for Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Figure 2 for Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Figure 3 for Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Figure 4 for Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Viaarxiv icon

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Add code
Apr 08, 2025
Figure 1 for VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Figure 2 for VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Figure 3 for VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Viaarxiv icon

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Add code
Mar 18, 2025
Viaarxiv icon

OpenAI o1 System Card

Add code
Dec 21, 2024
Figure 1 for OpenAI o1 System Card
Figure 2 for OpenAI o1 System Card
Figure 3 for OpenAI o1 System Card
Figure 4 for OpenAI o1 System Card
Viaarxiv icon