Picture for Xiyang Dai

Xiyang Dai

LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation

Add code
Nov 07, 2024
Figure 1 for LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
Figure 2 for LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
Figure 3 for LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
Figure 4 for LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
Viaarxiv icon

DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs

Add code
Jun 06, 2024
Viaarxiv icon

Efficient Modulation for Vision Networks

Add code
Mar 29, 2024
Viaarxiv icon

Rewrite the Stars

Add code
Mar 29, 2024
Viaarxiv icon

Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search

Add code
Mar 15, 2024
Viaarxiv icon

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Add code
Nov 10, 2023
Viaarxiv icon

On the Hidden Waves of Image

Add code
Oct 19, 2023
Viaarxiv icon

LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following

Add code
Oct 18, 2023
Viaarxiv icon

Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection

Add code
Oct 18, 2023
Viaarxiv icon

Image is First-order Norm+Linear Autoregressive

Add code
May 25, 2023
Viaarxiv icon