Picture for Yihan Zeng

Yihan Zeng

Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

Add code
Mar 08, 2025
Viaarxiv icon

UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting

Add code
Feb 25, 2025
Viaarxiv icon

Corrupted but Not Broken: Rethinking the Impact of Corrupted Data in Visual Instruction Tuning

Add code
Feb 18, 2025
Viaarxiv icon

FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors

Add code
Jan 14, 2025
Figure 1 for FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
Figure 2 for FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
Figure 3 for FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
Figure 4 for FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
Viaarxiv icon

AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning

Add code
Nov 18, 2024
Viaarxiv icon

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Add code
Sep 26, 2024
Figure 1 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 2 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 3 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 4 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Viaarxiv icon

JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation

Add code
Jul 17, 2024
Figure 1 for JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
Figure 2 for JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
Figure 3 for JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
Figure 4 for JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
Viaarxiv icon

DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors

Add code
Jun 03, 2024
Viaarxiv icon

Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

Add code
Jun 02, 2024
Viaarxiv icon

OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation

Add code
Mar 18, 2024
Viaarxiv icon