Picture for Shilong Liu

Shilong Liu

Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective

Add code
Oct 29, 2024
Viaarxiv icon

TAPTRv2: Attention-based Position Update Improves Tracking Any Point

Add code
Jul 23, 2024
Figure 1 for TAPTRv2: Attention-based Position Update Improves Tracking Any Point
Figure 2 for TAPTRv2: Attention-based Position Update Improves Tracking Any Point
Figure 3 for TAPTRv2: Attention-based Position Update Improves Tracking Any Point
Figure 4 for TAPTRv2: Attention-based Position Update Improves Tracking Any Point
Viaarxiv icon

MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

Add code
Jul 02, 2024
Viaarxiv icon

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

Add code
Jul 01, 2024
Viaarxiv icon

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Add code
May 16, 2024
Figure 1 for Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Figure 2 for Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Figure 3 for Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Figure 4 for Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Viaarxiv icon

Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models

Add code
May 07, 2024
Viaarxiv icon

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Add code
Mar 21, 2024
Viaarxiv icon

TAPTR: Tracking Any Point with Transformers as Detection

Add code
Mar 19, 2024
Viaarxiv icon

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

Add code
Jan 25, 2024
Viaarxiv icon

Interfacing Foundation Models' Embeddings

Add code
Dec 12, 2023
Viaarxiv icon