Picture for Xun Yang

Xun Yang

AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring

Add code
Jan 16, 2025
Figure 1 for AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
Figure 2 for AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
Figure 3 for AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
Figure 4 for AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
Viaarxiv icon

Tuning-Free Long Video Generation via Global-Local Collaborative Diffusion

Add code
Jan 08, 2025
Viaarxiv icon

Learning states enhanced knowledge tracing: Simulating the diversity in real-world learning process

Add code
Dec 27, 2024
Viaarxiv icon

Repetitive Action Counting with Hybrid Temporal Relation Modeling

Add code
Dec 10, 2024
Viaarxiv icon

PEMF-VVTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm

Add code
Dec 05, 2024
Figure 1 for PEMF-VVTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm
Figure 2 for PEMF-VVTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm
Figure 3 for PEMF-VVTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm
Figure 4 for PEMF-VVTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm
Viaarxiv icon

Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models

Add code
Nov 19, 2024
Figure 1 for Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
Figure 2 for Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
Figure 3 for Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
Figure 4 for Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
Viaarxiv icon

Grounding is All You Need? Dual Temporal Grounding for Video Dialog

Add code
Oct 08, 2024
Figure 1 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Figure 2 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Figure 3 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Figure 4 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Viaarxiv icon

Scene-Text Grounding for Text-Based Video Question Answering

Add code
Sep 22, 2024
Figure 1 for Scene-Text Grounding for Text-Based Video Question Answering
Figure 2 for Scene-Text Grounding for Text-Based Video Question Answering
Figure 3 for Scene-Text Grounding for Text-Based Video Question Answering
Figure 4 for Scene-Text Grounding for Text-Based Video Question Answering
Viaarxiv icon

Dual-stream Feature Augmentation for Domain Generalization

Add code
Sep 07, 2024
Viaarxiv icon

GRPose: Learning Graph Relations for Human Image Generation with Pose Priors

Add code
Aug 29, 2024
Viaarxiv icon