Picture for Junbin Xiao

Junbin Xiao

Scene-Text Grounding for Text-Based Video Question Answering

Add code
Sep 22, 2024
Figure 1 for Scene-Text Grounding for Text-Based Video Question Answering
Figure 2 for Scene-Text Grounding for Text-Based Video Question Answering
Figure 3 for Scene-Text Grounding for Text-Based Video Question Answering
Figure 4 for Scene-Text Grounding for Text-Based Video Question Answering
Viaarxiv icon

Question-Answering Dense Video Events

Add code
Sep 10, 2024
Viaarxiv icon

Towards Analyzing and Mitigating Sycophancy in Large Vision-Language Models

Add code
Aug 21, 2024
Viaarxiv icon

VideoQA in the Era of LLMs: An Empirical Study

Add code
Aug 08, 2024
Viaarxiv icon

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

Add code
Jun 09, 2024
Viaarxiv icon

Abductive Ego-View Accident Video Understanding for Safe Driving Perception

Add code
Mar 01, 2024
Viaarxiv icon

Can I Trust Your Answer? Visually Grounded Video Question Answering

Add code
Sep 04, 2023
Viaarxiv icon

Discovering Spatio-Temporal Rationales for Video Question Answering

Add code
Jul 22, 2023
Viaarxiv icon

Contrastive Video Question Answering via Video Graph Transformer

Add code
Feb 27, 2023
Figure 1 for Contrastive Video Question Answering via Video Graph Transformer
Figure 2 for Contrastive Video Question Answering via Video Graph Transformer
Figure 3 for Contrastive Video Question Answering via Video Graph Transformer
Figure 4 for Contrastive Video Question Answering via Video Graph Transformer
Viaarxiv icon

Equivariant and Invariant Grounding for Video Question Answering

Add code
Jul 26, 2022
Figure 1 for Equivariant and Invariant Grounding for Video Question Answering
Figure 2 for Equivariant and Invariant Grounding for Video Question Answering
Figure 3 for Equivariant and Invariant Grounding for Video Question Answering
Figure 4 for Equivariant and Invariant Grounding for Video Question Answering
Viaarxiv icon