Picture for Sunan He

Sunan He

MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning

Add code
Apr 23, 2024
Viaarxiv icon

Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models

Add code
Aug 27, 2023
Viaarxiv icon

D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation

Add code
Aug 08, 2023
Viaarxiv icon

Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding

Add code
May 18, 2023
Viaarxiv icon

VLMAE: Vision-Language Masked Autoencoder

Add code
Aug 19, 2022
Figure 1 for VLMAE: Vision-Language Masked Autoencoder
Figure 2 for VLMAE: Vision-Language Masked Autoencoder
Figure 3 for VLMAE: Vision-Language Masked Autoencoder
Figure 4 for VLMAE: Vision-Language Masked Autoencoder
Viaarxiv icon

Exploiting Feature Diversity for Make-up Temporal Video Grounding

Add code
Aug 12, 2022
Figure 1 for Exploiting Feature Diversity for Make-up Temporal Video Grounding
Figure 2 for Exploiting Feature Diversity for Make-up Temporal Video Grounding
Figure 3 for Exploiting Feature Diversity for Make-up Temporal Video Grounding
Figure 4 for Exploiting Feature Diversity for Make-up Temporal Video Grounding
Viaarxiv icon

Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer

Add code
Jul 05, 2022
Figure 1 for Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer
Figure 2 for Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer
Figure 3 for Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer
Figure 4 for Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer
Viaarxiv icon