Picture for Xue Yang

Xue Yang

DiffCLIP: Few-shot Language-driven Multimodal Classifier

Add code
Dec 10, 2024
Viaarxiv icon

Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement

Add code
Dec 05, 2024
Viaarxiv icon

GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data

Add code
Nov 27, 2024
Viaarxiv icon

Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation

Add code
Nov 04, 2024
Figure 1 for Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation
Figure 2 for Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation
Figure 3 for Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation
Figure 4 for Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation
Viaarxiv icon

PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection

Add code
Oct 10, 2024
Figure 1 for PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection
Figure 2 for PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection
Figure 3 for PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection
Figure 4 for PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection
Viaarxiv icon

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Add code
Oct 10, 2024
Figure 1 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Figure 2 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Figure 3 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Figure 4 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Viaarxiv icon

5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks

Add code
Aug 15, 2024
Viaarxiv icon

Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach

Add code
Jun 13, 2024
Figure 1 for Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach
Figure 2 for Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach
Figure 3 for Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach
Figure 4 for Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach
Viaarxiv icon

Towards Vision-Language Geo-Foundation Model: A Survey

Add code
Jun 13, 2024
Figure 1 for Towards Vision-Language Geo-Foundation Model: A Survey
Figure 2 for Towards Vision-Language Geo-Foundation Model: A Survey
Figure 3 for Towards Vision-Language Geo-Foundation Model: A Survey
Figure 4 for Towards Vision-Language Geo-Foundation Model: A Survey
Viaarxiv icon

UCDNet: Multi-UAV Collaborative 3D Object Detection Network by Reliable Feature Mapping

Add code
Jun 07, 2024
Viaarxiv icon