Picture for Zhuowan Li

Zhuowan Li

Johns Hopkins University

ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

Add code
Aug 05, 2024
Viaarxiv icon

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

Add code
Jul 23, 2024
Viaarxiv icon

Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA

Add code
Mar 28, 2024
Viaarxiv icon

Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models

Add code
Dec 09, 2023
Figure 1 for Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Figure 2 for Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Figure 3 for Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Figure 4 for Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Viaarxiv icon

3D-Aware Visual Question Answering about Parts, Poses and Occlusions

Add code
Oct 27, 2023
Viaarxiv icon

Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning

Add code
Dec 01, 2022
Viaarxiv icon

Localization vs. Semantics: How Can Language Benefit Visual Representation Learning?

Add code
Dec 01, 2022
Figure 1 for Localization vs. Semantics: How Can Language Benefit Visual Representation Learning?
Figure 2 for Localization vs. Semantics: How Can Language Benefit Visual Representation Learning?
Figure 3 for Localization vs. Semantics: How Can Language Benefit Visual Representation Learning?
Figure 4 for Localization vs. Semantics: How Can Language Benefit Visual Representation Learning?
Viaarxiv icon

Visual Commonsense in Pretrained Unimodal and Multimodal Models

Add code
May 04, 2022
Figure 1 for Visual Commonsense in Pretrained Unimodal and Multimodal Models
Figure 2 for Visual Commonsense in Pretrained Unimodal and Multimodal Models
Figure 3 for Visual Commonsense in Pretrained Unimodal and Multimodal Models
Figure 4 for Visual Commonsense in Pretrained Unimodal and Multimodal Models
Viaarxiv icon

SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering

Add code
Apr 05, 2022
Figure 1 for SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering
Figure 2 for SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering
Figure 3 for SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering
Figure 4 for SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering
Viaarxiv icon

Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images

Add code
Oct 01, 2021
Figure 1 for Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
Figure 2 for Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
Figure 3 for Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
Figure 4 for Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
Viaarxiv icon