Picture for Zehuan Yuan

Zehuan Yuan

Liquid: Language Models are Scalable Multi-modal Generators

Add code
Dec 05, 2024
Figure 1 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 2 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 3 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 4 for Liquid: Language Models are Scalable Multi-modal Generators
Viaarxiv icon

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Add code
Dec 05, 2024
Viaarxiv icon

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

Add code
Dec 04, 2024
Viaarxiv icon

ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval

Add code
Oct 24, 2024
Figure 1 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Figure 2 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Figure 3 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Figure 4 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Viaarxiv icon

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

Add code
Jun 13, 2024
Figure 1 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 2 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 3 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 4 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Viaarxiv icon

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Add code
Jun 10, 2024
Viaarxiv icon

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Add code
Apr 19, 2024
Viaarxiv icon

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Add code
Apr 03, 2024
Figure 1 for Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Figure 2 for Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Figure 3 for Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Figure 4 for Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Viaarxiv icon

Generative Region-Language Pretraining for Open-Ended Object Detection

Add code
Mar 15, 2024
Viaarxiv icon

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Add code
Dec 25, 2023
Viaarxiv icon