Picture for Size Wu

Size Wu

OpenUni: A Simple Baseline for Unified Multimodal Understanding and Generation

Add code
May 29, 2025
Viaarxiv icon

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

Add code
Mar 27, 2025
Viaarxiv icon

F-LMM: Grounding Frozen Large Multimodal Models

Add code
Jun 09, 2024
Figure 1 for F-LMM: Grounding Frozen Large Multimodal Models
Figure 2 for F-LMM: Grounding Frozen Large Multimodal Models
Figure 3 for F-LMM: Grounding Frozen Large Multimodal Models
Figure 4 for F-LMM: Grounding Frozen Large Multimodal Models
Viaarxiv icon

OMG-Seg: Is One Model Good Enough For All Segmentation?

Add code
Jan 18, 2024
Figure 1 for OMG-Seg: Is One Model Good Enough For All Segmentation?
Figure 2 for OMG-Seg: Is One Model Good Enough For All Segmentation?
Figure 3 for OMG-Seg: Is One Model Good Enough For All Segmentation?
Figure 4 for OMG-Seg: Is One Model Good Enough For All Segmentation?
Viaarxiv icon

CLIM: Contrastive Language-Image Mosaic for Region Representation

Add code
Dec 19, 2023
Viaarxiv icon

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

Add code
Oct 02, 2023
Figure 1 for CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Figure 2 for CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Figure 3 for CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Figure 4 for CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Viaarxiv icon

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection

Add code
Oct 02, 2023
Figure 1 for DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection
Figure 2 for DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection
Figure 3 for DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection
Figure 4 for DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection
Viaarxiv icon

Aligning Bag of Regions for Open-Vocabulary Object Detection

Add code
Feb 27, 2023
Viaarxiv icon

Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images

Add code
Sep 13, 2021
Figure 1 for Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
Figure 2 for Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
Figure 3 for Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
Figure 4 for Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
Viaarxiv icon