Picture for Lanqing Hong

Lanqing Hong

InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search

Add code
Dec 21, 2025
Viaarxiv icon

Developing a Grounded View of AI

Add code
Nov 18, 2025
Viaarxiv icon

ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

Add code
Jul 02, 2025
Figure 1 for ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving
Viaarxiv icon

Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning

Add code
Jun 05, 2025
Figure 1 for Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning
Figure 2 for Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning
Figure 3 for Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning
Figure 4 for Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning
Viaarxiv icon

Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning

Add code
May 28, 2025
Figure 1 for Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
Figure 2 for Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
Figure 3 for Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
Figure 4 for Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
Viaarxiv icon

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Add code
May 08, 2025
Figure 1 for Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
Figure 2 for Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
Figure 3 for Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
Figure 4 for Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
Viaarxiv icon

PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning

Add code
Apr 08, 2025
Figure 1 for PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Figure 2 for PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Figure 3 for PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Figure 4 for PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Viaarxiv icon

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

Add code
Apr 03, 2025
Figure 1 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Figure 2 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Figure 3 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Figure 4 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Viaarxiv icon

Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

Add code
Mar 08, 2025
Viaarxiv icon

Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs

Add code
Mar 07, 2025
Figure 1 for Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs
Figure 2 for Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs
Figure 3 for Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs
Figure 4 for Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs
Viaarxiv icon