Picture for Zheng Ge

Zheng Ge

Reconstructive Visual Instruction Tuning

Add code
Oct 12, 2024
Viaarxiv icon

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Add code
Sep 03, 2024
Viaarxiv icon

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Add code
Jun 24, 2024
Viaarxiv icon

Focus Anywhere for Fine-grained Multi-page Document Understanding

Add code
May 23, 2024
Viaarxiv icon

Self-Supervised Visual Preference Alignment

Add code
Apr 16, 2024
Viaarxiv icon

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

Add code
Apr 15, 2024
Viaarxiv icon

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Add code
Mar 06, 2024
Viaarxiv icon

Small Language Model Meets with Reinforced Vision Vocabulary

Add code
Jan 23, 2024
Viaarxiv icon

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Add code
Dec 11, 2023
Viaarxiv icon

Merlin:Empowering Multimodal LLMs with Foresight Minds

Add code
Nov 30, 2023
Viaarxiv icon