Picture for Xinlong Wang

Xinlong Wang

Autoregressive Video Generation without Vector Quantization

Add code
Dec 18, 2024
Viaarxiv icon

Falcon-UI: Understanding GUI Before Following User Instructions

Add code
Dec 12, 2024
Viaarxiv icon

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

Add code
Dec 09, 2024
Viaarxiv icon

A Simple Image Segmentation Framework via In-Context Examples

Add code
Oct 07, 2024
Viaarxiv icon

Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation

Add code
Oct 03, 2024
Viaarxiv icon

Emu3: Next-Token Prediction is All You Need

Add code
Sep 27, 2024
Viaarxiv icon

Diffusion Feedback Helps CLIP See Better

Add code
Jul 29, 2024
Viaarxiv icon

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Add code
Jul 11, 2024
Figure 1 for DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Figure 2 for DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Figure 3 for DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Figure 4 for DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Viaarxiv icon

Unveiling Encoder-Free Vision-Language Models

Add code
Jun 17, 2024
Viaarxiv icon

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions

Add code
Feb 17, 2024
Viaarxiv icon