Picture for Zhen Li

Zhen Li

LMO, CELESTE, HEC Paris

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Add code
Apr 10, 2025
Viaarxiv icon

OmniCaptioner: One Captioner to Rule Them All

Add code
Apr 09, 2025
Viaarxiv icon

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

Add code
Apr 08, 2025
Viaarxiv icon

Empowering Large Language Models with 3D Situation Awareness

Add code
Mar 29, 2025
Viaarxiv icon

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

Add code
Mar 27, 2025
Viaarxiv icon

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

Add code
Mar 27, 2025
Viaarxiv icon

AR-1-to-3: Single Image to Consistent 3D Object Generation via Next-View Prediction

Add code
Mar 17, 2025
Viaarxiv icon

DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation

Add code
Mar 14, 2025
Viaarxiv icon

PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models

Add code
Mar 13, 2025
Viaarxiv icon

DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering

Add code
Mar 06, 2025
Viaarxiv icon