Picture for Zehuan Yuan

Zehuan Yuan

HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation

Add code
Feb 10, 2025
Viaarxiv icon

Goku: Flow Based Video Generative Foundation Models

Add code
Feb 10, 2025
Viaarxiv icon

FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

Add code
Feb 07, 2025
Viaarxiv icon

Liquid: Language Models are Scalable Multi-modal Generators

Add code
Dec 05, 2024
Figure 1 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 2 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 3 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 4 for Liquid: Language Models are Scalable Multi-modal Generators
Viaarxiv icon

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Add code
Dec 05, 2024
Viaarxiv icon

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

Add code
Dec 04, 2024
Figure 1 for TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Figure 2 for TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Figure 3 for TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Figure 4 for TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Viaarxiv icon

ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval

Add code
Oct 24, 2024
Figure 1 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Figure 2 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Figure 3 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Figure 4 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Viaarxiv icon

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

Add code
Jun 13, 2024
Figure 1 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 2 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 3 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 4 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Viaarxiv icon

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Add code
Jun 10, 2024
Viaarxiv icon

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Add code
Apr 19, 2024
Viaarxiv icon