Picture for Yongdong Zhang

Yongdong Zhang

Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models

Add code
May 26, 2025
Viaarxiv icon

Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking

Add code
May 26, 2025
Viaarxiv icon

Leveraging Robust Optimization for LLM Alignment under Distribution Shifts

Add code
Apr 08, 2025
Viaarxiv icon

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

Add code
Mar 31, 2025
Viaarxiv icon

Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation

Add code
Mar 25, 2025
Viaarxiv icon

SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability

Add code
Mar 18, 2025
Viaarxiv icon

OmniPrism: Learning Disentangled Visual Concept for Image Generation

Add code
Dec 16, 2024
Figure 1 for OmniPrism: Learning Disentangled Visual Concept for Image Generation
Figure 2 for OmniPrism: Learning Disentangled Visual Concept for Image Generation
Figure 3 for OmniPrism: Learning Disentangled Visual Concept for Image Generation
Figure 4 for OmniPrism: Learning Disentangled Visual Concept for Image Generation
Viaarxiv icon

LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation

Add code
Dec 13, 2024
Figure 1 for LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation
Figure 2 for LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation
Figure 3 for LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation
Figure 4 for LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation
Viaarxiv icon

T-SVG: Text-Driven Stereoscopic Video Generation

Add code
Dec 12, 2024
Viaarxiv icon

A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions

Add code
Dec 12, 2024
Viaarxiv icon