Picture for Conghui He

Conghui He

Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs

Add code
Apr 12, 2026
Viaarxiv icon

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

Add code
Apr 06, 2026
Viaarxiv icon

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

Add code
Mar 29, 2026
Viaarxiv icon

DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models

Add code
Mar 27, 2026
Viaarxiv icon

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Add code
Mar 26, 2026
Viaarxiv icon

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Add code
Mar 23, 2026
Viaarxiv icon

Molecular Identifier Visual Prompt and Verifiable Reinforcement Learning for Chemical Reaction Diagram Parsing

Add code
Mar 17, 2026
Viaarxiv icon

PointCoT: A Multi-modal Benchmark for Explicit 3D Geometric Reasoning

Add code
Feb 27, 2026
Viaarxiv icon

AgenticOCR: Parsing Only What You Need for Efficient Retrieval-Augmented Generation

Add code
Feb 27, 2026
Viaarxiv icon

The Trinity of Consistency as a Defining Principle for General World Models

Add code
Feb 26, 2026
Viaarxiv icon