Picture for Jiaheng Liu

Jiaheng Liu

Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation

Add code
Mar 20, 2025
Viaarxiv icon

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Add code
Mar 11, 2025
Viaarxiv icon

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

Add code
Feb 26, 2025
Viaarxiv icon

AIR: Complex Instruction Generation via Automatic Iterative Refinement

Add code
Feb 25, 2025
Viaarxiv icon

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

Add code
Feb 23, 2025
Viaarxiv icon

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Add code
Feb 20, 2025
Viaarxiv icon

ChineseSimpleVQA -- "See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models

Add code
Feb 19, 2025
Viaarxiv icon

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Add code
Feb 18, 2025
Viaarxiv icon

VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation

Add code
Feb 18, 2025
Viaarxiv icon

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

Add code
Feb 18, 2025
Viaarxiv icon