Picture for Xun Yang

Xun Yang

Repetitive Action Counting with Hybrid Temporal Relation Modeling

Add code
Dec 10, 2024
Viaarxiv icon

PEMF-VVTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm

Add code
Dec 05, 2024
Viaarxiv icon

Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models

Add code
Nov 19, 2024
Viaarxiv icon

Grounding is All You Need? Dual Temporal Grounding for Video Dialog

Add code
Oct 08, 2024
Figure 1 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Figure 2 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Figure 3 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Figure 4 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Viaarxiv icon

Scene-Text Grounding for Text-Based Video Question Answering

Add code
Sep 22, 2024
Figure 1 for Scene-Text Grounding for Text-Based Video Question Answering
Figure 2 for Scene-Text Grounding for Text-Based Video Question Answering
Figure 3 for Scene-Text Grounding for Text-Based Video Question Answering
Figure 4 for Scene-Text Grounding for Text-Based Video Question Answering
Viaarxiv icon

Dual-stream Feature Augmentation for Domain Generalization

Add code
Sep 07, 2024
Viaarxiv icon

GRPose: Learning Graph Relations for Human Image Generation with Pose Priors

Add code
Aug 29, 2024
Viaarxiv icon

Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks

Add code
Jul 30, 2024
Viaarxiv icon

Advancing Prompt Learning through an External Layer

Add code
Jul 29, 2024
Viaarxiv icon

Towards Scale-Aware Full Surround Monodepth with Transformers

Add code
Jul 15, 2024
Figure 1 for Towards Scale-Aware Full Surround Monodepth with Transformers
Figure 2 for Towards Scale-Aware Full Surround Monodepth with Transformers
Figure 3 for Towards Scale-Aware Full Surround Monodepth with Transformers
Figure 4 for Towards Scale-Aware Full Surround Monodepth with Transformers
Viaarxiv icon