Picture for Chi Jin

Chi Jin

MUSIC: MUlti-Step Instruction Contrast for Multi-Turn Reward Models

Add code
Dec 31, 2025
Viaarxiv icon

Recurrent Autoregressive Diffusion: Global Memory Meets Local Attention

Add code
Nov 17, 2025
Viaarxiv icon

Frontier LLMs Still Struggle with Simple Reasoning Tasks

Add code
Jul 09, 2025
Viaarxiv icon

Principled Out-of-Distribution Generalization via Simplicity

Add code
May 28, 2025
Viaarxiv icon

Learning World Models for Interactive Video Generation

Add code
May 28, 2025
Viaarxiv icon

Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities

Add code
May 19, 2025
Viaarxiv icon

PokéChamp: an Expert-level Minimax Language Agent

Add code
Mar 06, 2025
Viaarxiv icon

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

Add code
Feb 11, 2025
Viaarxiv icon

MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations

Add code
Feb 10, 2025
Figure 1 for MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Figure 2 for MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Figure 3 for MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Figure 4 for MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Viaarxiv icon

Generative Diffusion Modeling: A Practical Handbook

Add code
Dec 22, 2024
Viaarxiv icon