Picture for Size Wu

Size Wu

F-LMM: Grounding Frozen Large Multimodal Models

Add code
Jun 09, 2024
Viaarxiv icon

OMG-Seg: Is One Model Good Enough For All Segmentation?

Add code
Jan 18, 2024
Viaarxiv icon

CLIM: Contrastive Language-Image Mosaic for Region Representation

Add code
Dec 19, 2023
Viaarxiv icon

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

Add code
Oct 02, 2023
Figure 1 for CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Figure 2 for CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Figure 3 for CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Figure 4 for CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Viaarxiv icon

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection

Add code
Oct 02, 2023
Figure 1 for DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection
Figure 2 for DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection
Figure 3 for DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection
Figure 4 for DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection
Viaarxiv icon

Aligning Bag of Regions for Open-Vocabulary Object Detection

Add code
Feb 27, 2023
Viaarxiv icon

Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images

Add code
Sep 13, 2021
Figure 1 for Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
Figure 2 for Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
Figure 3 for Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
Figure 4 for Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
Viaarxiv icon