Picture for Yinmin Zhang

Yinmin Zhang

PRIME: A Process-Outcome Alignment Benchmark for Verifiable Reasoning in Mathematics and Engineering

Add code
Feb 12, 2026
Viaarxiv icon

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Add code
Feb 11, 2026
Viaarxiv icon

R-Align: Enhancing Generative Reward Models through Rationale-Centric Meta-Judging

Add code
Feb 06, 2026
Viaarxiv icon

STEP3-VL-10B Technical Report

Add code
Jan 15, 2026
Viaarxiv icon

PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

Add code
Jan 09, 2026
Viaarxiv icon

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Add code
Mar 31, 2025
Viaarxiv icon

Multi-matrix Factorization Attention

Add code
Dec 26, 2024
Figure 1 for Multi-matrix Factorization Attention
Figure 2 for Multi-matrix Factorization Attention
Figure 3 for Multi-matrix Factorization Attention
Figure 4 for Multi-matrix Factorization Attention
Viaarxiv icon

Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations

Add code
Dec 18, 2023
Viaarxiv icon

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

Add code
Dec 12, 2023
Figure 1 for A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Figure 2 for A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Figure 3 for A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Figure 4 for A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Viaarxiv icon

Masked Pretraining for Multi-Agent Decision Making

Add code
Oct 18, 2023
Viaarxiv icon