Picture for Weicheng Kuo

Weicheng Kuo

Learning Visual Grounding from Generative Vision and Language Model

Add code
Jul 18, 2024
Viaarxiv icon

3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation

Add code
Jan 04, 2024
Figure 1 for 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Figure 2 for 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Figure 3 for 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Figure 4 for 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Viaarxiv icon

Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection

Add code
Sep 29, 2023
Viaarxiv icon

Contrastive Feature Masking Open-Vocabulary Vision Transformer

Add code
Sep 02, 2023
Viaarxiv icon

DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

Add code
Jun 02, 2023
Figure 1 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 2 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 3 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 4 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Viaarxiv icon

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

Add code
May 11, 2023
Viaarxiv icon

RECLIP: Resource-efficient CLIP by Training with Small Images

Add code
Apr 12, 2023
Figure 1 for RECLIP: Resource-efficient CLIP by Training with Small Images
Figure 2 for RECLIP: Resource-efficient CLIP by Training with Small Images
Figure 3 for RECLIP: Resource-efficient CLIP by Training with Small Images
Figure 4 for RECLIP: Resource-efficient CLIP by Training with Small Images
Viaarxiv icon

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

Add code
Mar 30, 2023
Viaarxiv icon

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning

Add code
Dec 06, 2022
Viaarxiv icon

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Add code
Sep 30, 2022
Figure 1 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 2 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 3 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 4 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Viaarxiv icon