Picture for Zhenheng Yang

Zhenheng Yang

Mixture of Contexts for Long Video Generation

Add code
Aug 28, 2025
Viaarxiv icon

UniAPO: Unified Multimodal Automated Prompt Optimization

Add code
Aug 25, 2025
Viaarxiv icon

Show-o2: Improved Native Unified Multimodal Models

Add code
Jun 18, 2025
Viaarxiv icon

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Add code
May 29, 2025
Viaarxiv icon

DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling

Add code
May 16, 2025
Viaarxiv icon

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Add code
Apr 11, 2025
Viaarxiv icon

Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning

Add code
Mar 17, 2025
Viaarxiv icon

Long Context Tuning for Video Generation

Add code
Mar 13, 2025
Viaarxiv icon

UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths

Add code
Feb 10, 2025
Figure 1 for UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Figure 2 for UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Figure 3 for UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Figure 4 for UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Viaarxiv icon

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Add code
Jan 06, 2025
Viaarxiv icon