Picture for Shilong Liu

Shilong Liu

TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video

Add code
Nov 27, 2024
Viaarxiv icon

DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding

Add code
Nov 21, 2024
Figure 1 for DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
Figure 2 for DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
Figure 3 for DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
Figure 4 for DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
Viaarxiv icon

Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective

Add code
Oct 29, 2024
Figure 1 for Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Figure 2 for Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Figure 3 for Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Viaarxiv icon

TAPTRv2: Attention-based Position Update Improves Tracking Any Point

Add code
Jul 23, 2024
Figure 1 for TAPTRv2: Attention-based Position Update Improves Tracking Any Point
Figure 2 for TAPTRv2: Attention-based Position Update Improves Tracking Any Point
Figure 3 for TAPTRv2: Attention-based Position Update Improves Tracking Any Point
Figure 4 for TAPTRv2: Attention-based Position Update Improves Tracking Any Point
Viaarxiv icon

MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

Add code
Jul 02, 2024
Viaarxiv icon

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

Add code
Jul 01, 2024
Viaarxiv icon

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Add code
May 16, 2024
Figure 1 for Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Figure 2 for Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Figure 3 for Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Figure 4 for Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Viaarxiv icon

Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models

Add code
May 07, 2024
Figure 1 for Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models
Figure 2 for Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models
Figure 3 for Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models
Figure 4 for Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models
Viaarxiv icon

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Add code
Mar 21, 2024
Viaarxiv icon

TAPTR: Tracking Any Point with Transformers as Detection

Add code
Mar 19, 2024
Viaarxiv icon