Picture for Fuwen Luo

Fuwen Luo

How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game

Add code
Mar 13, 2025
Viaarxiv icon

DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms

Add code
Mar 05, 2025
Viaarxiv icon

Perspective Transition of Large Language Models for Solving Subjective Tasks

Add code
Jan 16, 2025
Viaarxiv icon

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding

Add code
Nov 06, 2024
Figure 1 for StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
Figure 2 for StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
Figure 3 for StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
Figure 4 for StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
Viaarxiv icon

ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models

Add code
Oct 07, 2024
Viaarxiv icon

Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models

Add code
Feb 27, 2024
Figure 1 for Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models
Figure 2 for Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models
Figure 3 for Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models
Figure 4 for Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models
Viaarxiv icon

CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models

Add code
Feb 21, 2024
Figure 1 for CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models
Figure 2 for CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models
Figure 3 for CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models
Figure 4 for CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models
Viaarxiv icon

Model Composition for Multimodal Large Language Models

Add code
Feb 20, 2024
Viaarxiv icon

Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion

Add code
Feb 19, 2024
Viaarxiv icon

Towards Unified Alignment Between Agents, Humans, and Environment

Add code
Feb 14, 2024
Figure 1 for Towards Unified Alignment Between Agents, Humans, and Environment
Figure 2 for Towards Unified Alignment Between Agents, Humans, and Environment
Figure 3 for Towards Unified Alignment Between Agents, Humans, and Environment
Figure 4 for Towards Unified Alignment Between Agents, Humans, and Environment
Viaarxiv icon