Picture for Xiaohui Shen

Xiaohui Shen

Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation

Add code
Feb 27, 2025
Viaarxiv icon

Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models

Add code
Feb 04, 2025
Figure 1 for Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models
Figure 2 for Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models
Figure 3 for Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models
Figure 4 for Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models
Viaarxiv icon

COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation

Add code
Feb 04, 2025
Viaarxiv icon

Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens

Add code
Jan 13, 2025
Figure 1 for Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Figure 2 for Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Figure 3 for Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Figure 4 for Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Viaarxiv icon

1.58-bit FLUX

Add code
Dec 24, 2024
Viaarxiv icon

FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching

Add code
Dec 19, 2024
Viaarxiv icon

Randomized Autoregressive Visual Generation

Add code
Nov 01, 2024
Viaarxiv icon

MaskBit: Embedding-free Image Generation via Bit Tokens

Add code
Sep 24, 2024
Figure 1 for MaskBit: Embedding-free Image Generation via Bit Tokens
Figure 2 for MaskBit: Embedding-free Image Generation via Bit Tokens
Figure 3 for MaskBit: Embedding-free Image Generation via Bit Tokens
Figure 4 for MaskBit: Embedding-free Image Generation via Bit Tokens
Viaarxiv icon

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Add code
Jun 13, 2024
Viaarxiv icon

An Image is Worth 32 Tokens for Reconstruction and Generation

Add code
Jun 11, 2024
Figure 1 for An Image is Worth 32 Tokens for Reconstruction and Generation
Figure 2 for An Image is Worth 32 Tokens for Reconstruction and Generation
Figure 3 for An Image is Worth 32 Tokens for Reconstruction and Generation
Figure 4 for An Image is Worth 32 Tokens for Reconstruction and Generation
Viaarxiv icon