Picture for Conghui He

Conghui He

PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents

Add code
May 11, 2026
Viaarxiv icon

NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

Add code
May 11, 2026
Viaarxiv icon

MolRecBench-Wild: A Real-World Benchmark for Optical Chemical Structure Recognition

Add code
May 07, 2026
Viaarxiv icon

Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

Add code
Apr 27, 2026
Viaarxiv icon

Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs

Add code
Apr 12, 2026
Viaarxiv icon

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

Add code
Apr 06, 2026
Viaarxiv icon

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

Add code
Mar 29, 2026
Viaarxiv icon

DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models

Add code
Mar 27, 2026
Viaarxiv icon

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Add code
Mar 26, 2026
Viaarxiv icon

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Add code
Mar 23, 2026
Viaarxiv icon