Picture for Xinlong Wang

Xinlong Wang

A Simple Image Segmentation Framework via In-Context Examples

Add code
Oct 07, 2024
Viaarxiv icon

Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation

Add code
Oct 03, 2024
Viaarxiv icon

Emu3: Next-Token Prediction is All You Need

Add code
Sep 27, 2024
Viaarxiv icon

Diffusion Feedback Helps CLIP See Better

Add code
Jul 29, 2024
Viaarxiv icon

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Add code
Jul 11, 2024
Figure 1 for DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Figure 2 for DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Figure 3 for DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Figure 4 for DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Viaarxiv icon

Unveiling Encoder-Free Vision-Language Models

Add code
Jun 17, 2024
Viaarxiv icon

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions

Add code
Feb 17, 2024
Viaarxiv icon

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Add code
Feb 06, 2024
Viaarxiv icon

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Add code
Jan 17, 2024
Viaarxiv icon

Generative Multimodal Models are In-Context Learners

Add code
Dec 20, 2023
Viaarxiv icon