Picture for Dan Guo

Dan Guo

Towards Open-Vocabulary Audio-Visual Event Localization

Add code
Nov 18, 2024
Viaarxiv icon

Grounding is All You Need? Dual Temporal Grounding for Video Dialog

Add code
Oct 08, 2024
Figure 1 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Figure 2 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Figure 3 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Figure 4 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Viaarxiv icon

Scene-Text Grounding for Text-Based Video Question Answering

Add code
Sep 22, 2024
Figure 1 for Scene-Text Grounding for Text-Based Video Question Answering
Figure 2 for Scene-Text Grounding for Text-Based Video Question Answering
Figure 3 for Scene-Text Grounding for Text-Based Video Question Answering
Figure 4 for Scene-Text Grounding for Text-Based Video Question Answering
Viaarxiv icon

Prototype Learning for Micro-gesture Classification

Add code
Aug 06, 2024
Viaarxiv icon

Label-anticipated Event Disentanglement for Audio-Visual Video Parsing

Add code
Jul 11, 2024
Viaarxiv icon

MMAD: Multi-label Micro-Action Detection in Videos

Add code
Jul 07, 2024
Viaarxiv icon

Micro-gesture Online Recognition using Learnable Query Points

Add code
Jul 05, 2024
Viaarxiv icon

A Two-Stage Adverse Weather Semantic Segmentation Method for WeatherProof Challenge CVPR 2024 Workshop UG2+

Add code
Jun 08, 2024
Viaarxiv icon

Joint Spatial-Temporal Modeling and Contrastive Learning for Self-supervised Heart Rate Measurement

Add code
Jun 07, 2024
Viaarxiv icon

Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling

Add code
Jun 03, 2024
Viaarxiv icon