Picture for Yihan Zeng

Yihan Zeng

AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning

Add code
Nov 18, 2024
Viaarxiv icon

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Add code
Sep 26, 2024
Figure 1 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 2 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 3 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 4 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Viaarxiv icon

JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation

Add code
Jul 17, 2024
Viaarxiv icon

DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors

Add code
Jun 03, 2024
Viaarxiv icon

Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

Add code
Jun 02, 2024
Viaarxiv icon

OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation

Add code
Mar 18, 2024
Viaarxiv icon

GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data

Add code
Feb 13, 2024
Viaarxiv icon

PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion

Add code
Dec 29, 2023
Viaarxiv icon

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

Add code
Dec 11, 2023
Viaarxiv icon

CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection

Add code
Oct 04, 2023
Viaarxiv icon