Picture for Xudong Lin

Xudong Lin

PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction

Add code
Jan 24, 2025
Viaarxiv icon

ENTER: Event Based Interpretable Reasoning for VideoQA

Add code
Jan 24, 2025
Viaarxiv icon

Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses

Add code
Sep 22, 2024
Figure 1 for Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Figure 2 for Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Figure 3 for Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Figure 4 for Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Viaarxiv icon

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Add code
Jun 19, 2024
Figure 1 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Figure 2 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Figure 3 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Figure 4 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Viaarxiv icon

Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies

Add code
Jun 16, 2024
Figure 1 for Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Figure 2 for Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Figure 3 for Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Figure 4 for Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Viaarxiv icon

BLINK: Multimodal Large Language Models Can See but Not Perceive

Add code
Apr 18, 2024
Figure 1 for BLINK: Multimodal Large Language Models Can See but Not Perceive
Figure 2 for BLINK: Multimodal Large Language Models Can See but Not Perceive
Figure 3 for BLINK: Multimodal Large Language Models Can See but Not Perceive
Figure 4 for BLINK: Multimodal Large Language Models Can See but Not Perceive
Viaarxiv icon

SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos

Add code
Mar 03, 2024
Viaarxiv icon

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Add code
Jan 18, 2024
Figure 1 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Figure 2 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Figure 3 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Figure 4 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Viaarxiv icon

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

Add code
Dec 04, 2023
Figure 1 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 2 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 3 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 4 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Viaarxiv icon

Video Summarization: Towards Entity-Aware Captions

Add code
Dec 01, 2023
Figure 1 for Video Summarization: Towards Entity-Aware Captions
Figure 2 for Video Summarization: Towards Entity-Aware Captions
Figure 3 for Video Summarization: Towards Entity-Aware Captions
Figure 4 for Video Summarization: Towards Entity-Aware Captions
Viaarxiv icon