Picture for Jihyung Kil

Jihyung Kil

CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs

Add code
Jul 23, 2024
Viaarxiv icon

ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback

Add code
Jun 25, 2024
Viaarxiv icon

II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering

Add code
Feb 16, 2024
Viaarxiv icon

Dual-View Visual Contextualization for Web Navigation

Add code
Feb 06, 2024
Viaarxiv icon

GPT-4V is a Generalist Web Agent, if Grounded

Add code
Jan 03, 2024
Viaarxiv icon

PreSTU: Pre-Training for Scene-Text Understanding

Add code
Sep 12, 2022
Figure 1 for PreSTU: Pre-Training for Scene-Text Understanding
Figure 2 for PreSTU: Pre-Training for Scene-Text Understanding
Figure 3 for PreSTU: Pre-Training for Scene-Text Understanding
Figure 4 for PreSTU: Pre-Training for Scene-Text Understanding
Viaarxiv icon

One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

Add code
Feb 14, 2022
Figure 1 for One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
Figure 2 for One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
Figure 3 for One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
Figure 4 for One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
Viaarxiv icon

Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering

Add code
Sep 13, 2021
Figure 1 for Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Figure 2 for Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Figure 3 for Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Figure 4 for Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Viaarxiv icon

Revisiting Document Representations for Large-Scale Zero-Shot Learning

Add code
Apr 21, 2021
Figure 1 for Revisiting Document Representations for Large-Scale Zero-Shot Learning
Figure 2 for Revisiting Document Representations for Large-Scale Zero-Shot Learning
Figure 3 for Revisiting Document Representations for Large-Scale Zero-Shot Learning
Figure 4 for Revisiting Document Representations for Large-Scale Zero-Shot Learning
Viaarxiv icon