Picture for Shih-Fu Chang

Shih-Fu Chang

Columbia University

PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction

Add code
Jan 24, 2025
Figure 1 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Figure 2 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Figure 3 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Figure 4 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Viaarxiv icon

ENTER: Event Based Interpretable Reasoning for VideoQA

Add code
Jan 24, 2025
Figure 1 for ENTER: Event Based Interpretable Reasoning for VideoQA
Figure 2 for ENTER: Event Based Interpretable Reasoning for VideoQA
Figure 3 for ENTER: Event Based Interpretable Reasoning for VideoQA
Figure 4 for ENTER: Event Based Interpretable Reasoning for VideoQA
Viaarxiv icon

WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization

Add code
May 28, 2024
Viaarxiv icon

Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions

Add code
May 23, 2024
Viaarxiv icon

MoDE: CLIP Data Experts via Clustering

Add code
Apr 24, 2024
Figure 1 for MoDE: CLIP Data Experts via Clustering
Figure 2 for MoDE: CLIP Data Experts via Clustering
Figure 3 for MoDE: CLIP Data Experts via Clustering
Figure 4 for MoDE: CLIP Data Experts via Clustering
Viaarxiv icon

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Add code
Apr 11, 2024
Figure 1 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 2 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 3 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 4 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Viaarxiv icon

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

Add code
Mar 25, 2024
Viaarxiv icon

SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos

Add code
Mar 03, 2024
Viaarxiv icon

Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning

Add code
Dec 15, 2023
Figure 1 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Figure 2 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Figure 3 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Figure 4 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Viaarxiv icon

Video Summarization: Towards Entity-Aware Captions

Add code
Dec 01, 2023
Figure 1 for Video Summarization: Towards Entity-Aware Captions
Figure 2 for Video Summarization: Towards Entity-Aware Captions
Figure 3 for Video Summarization: Towards Entity-Aware Captions
Figure 4 for Video Summarization: Towards Entity-Aware Captions
Viaarxiv icon