Picture for Xiaoshan Yang

Xiaoshan Yang

Towards Visual Grounding: A Survey

Add code
Dec 28, 2024
Viaarxiv icon

OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling

Add code
Oct 10, 2024
Figure 1 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Figure 2 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Figure 3 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Figure 4 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Viaarxiv icon

A Comprehensive Review of Few-shot Action Recognition

Add code
Jul 20, 2024
Figure 1 for A Comprehensive Review of Few-shot Action Recognition
Figure 2 for A Comprehensive Review of Few-shot Action Recognition
Figure 3 for A Comprehensive Review of Few-shot Action Recognition
Figure 4 for A Comprehensive Review of Few-shot Action Recognition
Viaarxiv icon

Libra: Building Decoupled Vision System on Large Language Models

Add code
May 16, 2024
Viaarxiv icon

HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding

Add code
Apr 20, 2024
Viaarxiv icon

Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection

Add code
Aug 30, 2023
Viaarxiv icon

Multi-modal Queried Object Detection in the Wild

Add code
May 30, 2023
Viaarxiv icon

CLIP-VG: Self-paced Curriculum Adapting of CLIP via Exploiting Pseudo-Language Labels for Visual Grounding

Add code
May 15, 2023
Figure 1 for CLIP-VG: Self-paced Curriculum Adapting of CLIP via Exploiting Pseudo-Language Labels for Visual Grounding
Figure 2 for CLIP-VG: Self-paced Curriculum Adapting of CLIP via Exploiting Pseudo-Language Labels for Visual Grounding
Figure 3 for CLIP-VG: Self-paced Curriculum Adapting of CLIP via Exploiting Pseudo-Language Labels for Visual Grounding
Figure 4 for CLIP-VG: Self-paced Curriculum Adapting of CLIP via Exploiting Pseudo-Language Labels for Visual Grounding
Viaarxiv icon

SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification

Add code
Nov 28, 2022
Viaarxiv icon

Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding

Add code
Mar 29, 2022
Figure 1 for Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding
Figure 2 for Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding
Figure 3 for Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding
Figure 4 for Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding
Viaarxiv icon