Picture for Shih-Han Chou

Shih-Han Chou

MM-R$^3$: On (In-)Consistency of Multi-modal Large Language Models (MLLMs)

Add code
Oct 07, 2024
Viaarxiv icon

Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)

Add code
Jan 23, 2024
Figure 1 for Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)
Figure 2 for Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)
Figure 3 for Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)
Figure 4 for Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)
Viaarxiv icon

Implicit and Explicit Commonsense for Multi-sentence Video Captioning

Add code
Mar 14, 2023
Viaarxiv icon

An Improved Attention for Visual Question Answering

Add code
Nov 07, 2020
Figure 1 for An Improved Attention for Visual Question Answering
Figure 2 for An Improved Attention for Visual Question Answering
Figure 3 for An Improved Attention for Visual Question Answering
Figure 4 for An Improved Attention for Visual Question Answering
Viaarxiv icon

Visual Question Answering on 360° Images

Add code
Jan 10, 2020
Figure 1 for Visual Question Answering on 360° Images
Figure 2 for Visual Question Answering on 360° Images
Figure 3 for Visual Question Answering on 360° Images
Figure 4 for Visual Question Answering on 360° Images
Viaarxiv icon

360-Indoor: Towards Learning Real-World Objects in 360° Indoor Equirectangular Images

Add code
Oct 03, 2019
Figure 1 for 360-Indoor: Towards Learning Real-World Objects in 360° Indoor Equirectangular Images
Figure 2 for 360-Indoor: Towards Learning Real-World Objects in 360° Indoor Equirectangular Images
Figure 3 for 360-Indoor: Towards Learning Real-World Objects in 360° Indoor Equirectangular Images
Figure 4 for 360-Indoor: Towards Learning Real-World Objects in 360° Indoor Equirectangular Images
Viaarxiv icon

Self-view Grounding Given a Narrated 360° Video

Add code
Nov 23, 2017
Figure 1 for Self-view Grounding Given a Narrated 360° Video
Figure 2 for Self-view Grounding Given a Narrated 360° Video
Figure 3 for Self-view Grounding Given a Narrated 360° Video
Figure 4 for Self-view Grounding Given a Narrated 360° Video
Viaarxiv icon

Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization

Add code
May 18, 2017
Figure 1 for Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization
Figure 2 for Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization
Figure 3 for Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization
Figure 4 for Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization
Viaarxiv icon