Picture for Haoyuan Li

Haoyuan Li

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

Add code
Mar 07, 2025
Figure 1 for CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
Figure 2 for CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
Figure 3 for CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
Figure 4 for CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
Viaarxiv icon

DQO-MAP: Dual Quadrics Multi-Object mapping with Gaussian Splatting

Add code
Mar 04, 2025
Viaarxiv icon

MLINE-VINS: Robust Monocular Visual-Inertial SLAM With Flow Manhattan and Line Features

Add code
Mar 03, 2025
Viaarxiv icon

MINT: Multi-modal Chain of Thought in Unified Generative Models for Enhanced Image Generation

Add code
Mar 03, 2025
Viaarxiv icon

UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting

Add code
Feb 25, 2025
Viaarxiv icon

Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework

Add code
Dec 27, 2024
Viaarxiv icon

Coverage-based Fairness in Multi-document Summarization

Add code
Dec 11, 2024
Viaarxiv icon

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts

Add code
Dec 05, 2024
Figure 1 for T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Figure 2 for T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Figure 3 for T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Figure 4 for T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Viaarxiv icon

Unsupervised Multi-view UAV Image Geo-localization via Iterative Rendering

Add code
Nov 22, 2024
Viaarxiv icon

Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

Add code
Sep 27, 2024
Figure 1 for Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation
Figure 2 for Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation
Figure 3 for Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation
Figure 4 for Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation
Viaarxiv icon