Picture for Xudong Lin

Xudong Lin

ENTER: Event Based Interpretable Reasoning for VideoQA

Add code
Jan 24, 2025
Figure 1 for ENTER: Event Based Interpretable Reasoning for VideoQA
Figure 2 for ENTER: Event Based Interpretable Reasoning for VideoQA
Figure 3 for ENTER: Event Based Interpretable Reasoning for VideoQA
Figure 4 for ENTER: Event Based Interpretable Reasoning for VideoQA
Viaarxiv icon

PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction

Add code
Jan 24, 2025
Figure 1 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Figure 2 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Figure 3 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Figure 4 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Viaarxiv icon

Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses

Add code
Sep 22, 2024
Figure 1 for Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Figure 2 for Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Figure 3 for Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Figure 4 for Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Viaarxiv icon

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Add code
Jun 19, 2024
Figure 1 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Figure 2 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Figure 3 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Figure 4 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Viaarxiv icon

Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies

Add code
Jun 16, 2024
Figure 1 for Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Figure 2 for Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Figure 3 for Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Figure 4 for Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Viaarxiv icon

BLINK: Multimodal Large Language Models Can See but Not Perceive

Add code
Apr 18, 2024
Figure 1 for BLINK: Multimodal Large Language Models Can See but Not Perceive
Figure 2 for BLINK: Multimodal Large Language Models Can See but Not Perceive
Figure 3 for BLINK: Multimodal Large Language Models Can See but Not Perceive
Figure 4 for BLINK: Multimodal Large Language Models Can See but Not Perceive
Viaarxiv icon

SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos

Add code
Mar 03, 2024
Viaarxiv icon

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Add code
Jan 18, 2024
Figure 1 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Figure 2 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Figure 3 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Figure 4 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Viaarxiv icon

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

Add code
Dec 04, 2023
Figure 1 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 2 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 3 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 4 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Viaarxiv icon

Video Summarization: Towards Entity-Aware Captions

Add code
Dec 01, 2023
Figure 1 for Video Summarization: Towards Entity-Aware Captions
Figure 2 for Video Summarization: Towards Entity-Aware Captions
Figure 3 for Video Summarization: Towards Entity-Aware Captions
Figure 4 for Video Summarization: Towards Entity-Aware Captions
Viaarxiv icon