Picture for Zhaowei Cai

Zhaowei Cai

Open-World Dynamic Prompt and Continual Visual Representation Learning

Add code
Sep 09, 2024
Figure 1 for Open-World Dynamic Prompt and Continual Visual Representation Learning
Figure 2 for Open-World Dynamic Prompt and Continual Visual Representation Learning
Figure 3 for Open-World Dynamic Prompt and Continual Visual Representation Learning
Figure 4 for Open-World Dynamic Prompt and Continual Visual Representation Learning
Viaarxiv icon

NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality

Add code
Aug 18, 2024
Figure 1 for NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Figure 2 for NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Figure 3 for NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Figure 4 for NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Viaarxiv icon

Mixed-Query Transformer: A Unified Image Segmentation Architecture

Add code
Apr 06, 2024
Figure 1 for Mixed-Query Transformer: A Unified Image Segmentation Architecture
Figure 2 for Mixed-Query Transformer: A Unified Image Segmentation Architecture
Figure 3 for Mixed-Query Transformer: A Unified Image Segmentation Architecture
Figure 4 for Mixed-Query Transformer: A Unified Image Segmentation Architecture
Viaarxiv icon

Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts

Add code
May 11, 2023
Viaarxiv icon

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

Add code
Feb 14, 2023
Viaarxiv icon

Semi-supervised Vision Transformers at Scale

Add code
Aug 11, 2022
Figure 1 for Semi-supervised Vision Transformers at Scale
Figure 2 for Semi-supervised Vision Transformers at Scale
Figure 3 for Semi-supervised Vision Transformers at Scale
Figure 4 for Semi-supervised Vision Transformers at Scale
Viaarxiv icon

Masked Vision and Language Modeling for Multi-modal Representation Learning

Add code
Aug 03, 2022
Figure 1 for Masked Vision and Language Modeling for Multi-modal Representation Learning
Figure 2 for Masked Vision and Language Modeling for Multi-modal Representation Learning
Figure 3 for Masked Vision and Language Modeling for Multi-modal Representation Learning
Figure 4 for Masked Vision and Language Modeling for Multi-modal Representation Learning
Viaarxiv icon

Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark

Add code
Jul 22, 2022
Figure 1 for Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark
Figure 2 for Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark
Figure 3 for Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark
Figure 4 for Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark
Viaarxiv icon

X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks

Add code
Apr 12, 2022
Figure 1 for X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
Figure 2 for X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
Figure 3 for X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
Figure 4 for X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
Viaarxiv icon

Omni-DETR: Omni-Supervised Object Detection with Transformers

Add code
Mar 30, 2022
Figure 1 for Omni-DETR: Omni-Supervised Object Detection with Transformers
Figure 2 for Omni-DETR: Omni-Supervised Object Detection with Transformers
Figure 3 for Omni-DETR: Omni-Supervised Object Detection with Transformers
Figure 4 for Omni-DETR: Omni-Supervised Object Detection with Transformers
Viaarxiv icon