Picture for Shaoxiang Chen

Shaoxiang Chen

ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model

Add code
Nov 04, 2024
Figure 1 for ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model
Figure 2 for ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model
Figure 3 for ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model
Figure 4 for ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model
Viaarxiv icon

EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models

Add code
Sep 26, 2024
Figure 1 for EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models
Figure 2 for EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models
Figure 3 for EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models
Figure 4 for EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models
Viaarxiv icon

EventHallusion: Diagnosing Event Hallucinations in Video LLMs

Add code
Sep 25, 2024
Figure 1 for EventHallusion: Diagnosing Event Hallucinations in Video LLMs
Figure 2 for EventHallusion: Diagnosing Event Hallucinations in Video LLMs
Figure 3 for EventHallusion: Diagnosing Event Hallucinations in Video LLMs
Figure 4 for EventHallusion: Diagnosing Event Hallucinations in Video LLMs
Viaarxiv icon

Making Large Language Models Better Planners with Reasoning-Decision Alignment

Add code
Aug 25, 2024
Viaarxiv icon

MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis

Add code
Jul 03, 2024
Viaarxiv icon

Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models

Add code
Jun 12, 2024
Viaarxiv icon

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models

Add code
Mar 12, 2024
Viaarxiv icon

LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs

Add code
Jan 30, 2024
Viaarxiv icon

Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning

Add code
Dec 13, 2023
Figure 1 for Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning
Figure 2 for Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning
Figure 3 for Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning
Figure 4 for Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning
Viaarxiv icon

Prompting Large Language Models to Reformulate Queries for Moment Localization

Add code
Jun 06, 2023
Viaarxiv icon