Picture for Xiaohui Shen

Xiaohui Shen

Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models

Add code
Feb 04, 2025
Viaarxiv icon

COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation

Add code
Feb 04, 2025
Viaarxiv icon

Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens

Add code
Jan 13, 2025
Figure 1 for Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Figure 2 for Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Figure 3 for Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Figure 4 for Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Viaarxiv icon

1.58-bit FLUX

Add code
Dec 24, 2024
Viaarxiv icon

FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching

Add code
Dec 19, 2024
Viaarxiv icon

Randomized Autoregressive Visual Generation

Add code
Nov 01, 2024
Viaarxiv icon

MaskBit: Embedding-free Image Generation via Bit Tokens

Add code
Sep 24, 2024
Viaarxiv icon

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Add code
Jun 13, 2024
Viaarxiv icon

An Image is Worth 32 Tokens for Reconstruction and Generation

Add code
Jun 11, 2024
Figure 1 for An Image is Worth 32 Tokens for Reconstruction and Generation
Figure 2 for An Image is Worth 32 Tokens for Reconstruction and Generation
Figure 3 for An Image is Worth 32 Tokens for Reconstruction and Generation
Figure 4 for An Image is Worth 32 Tokens for Reconstruction and Generation
Viaarxiv icon

Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

Add code
Jun 04, 2024
Figure 1 for Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting
Figure 2 for Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting
Figure 3 for Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting
Figure 4 for Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting
Viaarxiv icon