Picture for Shengqiong Wu

Shengqiong Wu

Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene

Add code
Mar 19, 2025
Viaarxiv icon

Universal Scene Graph Generation

Add code
Mar 19, 2025
Viaarxiv icon

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Add code
Mar 16, 2025
Viaarxiv icon

Combating Multimodal LLM Hallucination via Bottom-up Holistic Reasoning

Add code
Dec 15, 2024
Viaarxiv icon

PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Add code
Aug 18, 2024
Viaarxiv icon

Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment

Add code
Jun 27, 2024
Figure 1 for Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Figure 2 for Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Figure 3 for Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Figure 4 for Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Viaarxiv icon

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Add code
Jun 27, 2024
Figure 1 for OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Figure 2 for OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Figure 3 for OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Figure 4 for OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Viaarxiv icon

Towards Semantic Equivalence of Tokenization in Multimodal LLM

Add code
Jun 07, 2024
Figure 1 for Towards Semantic Equivalence of Tokenization in Multimodal LLM
Figure 2 for Towards Semantic Equivalence of Tokenization in Multimodal LLM
Figure 3 for Towards Semantic Equivalence of Tokenization in Multimodal LLM
Figure 4 for Towards Semantic Equivalence of Tokenization in Multimodal LLM
Viaarxiv icon

Modeling Unified Semantic Discourse Structure for High-quality Headline Generation

Add code
Mar 23, 2024
Viaarxiv icon

NExT-GPT: Any-to-Any Multimodal LLM

Add code
Sep 13, 2023
Figure 1 for NExT-GPT: Any-to-Any Multimodal LLM
Figure 2 for NExT-GPT: Any-to-Any Multimodal LLM
Figure 3 for NExT-GPT: Any-to-Any Multimodal LLM
Figure 4 for NExT-GPT: Any-to-Any Multimodal LLM
Viaarxiv icon