Picture for Zhecan Wang

Zhecan Wang

PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction

Add code
Jan 24, 2025
Figure 1 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Figure 2 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Figure 3 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Figure 4 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Viaarxiv icon

ENTER: Event Based Interpretable Reasoning for VideoQA

Add code
Jan 24, 2025
Figure 1 for ENTER: Event Based Interpretable Reasoning for VideoQA
Figure 2 for ENTER: Event Based Interpretable Reasoning for VideoQA
Figure 3 for ENTER: Event Based Interpretable Reasoning for VideoQA
Figure 4 for ENTER: Event Based Interpretable Reasoning for VideoQA
Viaarxiv icon

HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning

Add code
Jul 22, 2024
Viaarxiv icon

Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions

Add code
May 23, 2024
Viaarxiv icon

Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond

Add code
Oct 31, 2023
Viaarxiv icon

UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding

Add code
Jul 03, 2023
Viaarxiv icon

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models

Add code
May 24, 2023
Figure 1 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 2 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 3 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 4 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Viaarxiv icon

CoBIT: A Contrastive Bi-directional Image-Text Generation Model

Add code
Mar 23, 2023
Figure 1 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Figure 2 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Figure 3 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Figure 4 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Viaarxiv icon

Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding

Add code
Dec 14, 2022
Figure 1 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Figure 2 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Figure 3 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Figure 4 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Viaarxiv icon

Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense

Add code
Nov 10, 2022
Viaarxiv icon