Picture for Haoyu Lu

Haoyu Lu

Kimi K2.5: Visual Agentic Intelligence

Add code
Feb 02, 2026
Viaarxiv icon

WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models

Add code
Jan 28, 2026
Viaarxiv icon

Towards Pixel-Level VLM Perception via Simple Points Prediction

Add code
Jan 27, 2026
Viaarxiv icon

BabyVision: Visual Reasoning Beyond Language

Add code
Jan 10, 2026
Viaarxiv icon

HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices

Add code
Dec 16, 2025
Figure 1 for HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
Figure 2 for HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
Figure 3 for HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
Figure 4 for HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
Viaarxiv icon

Physics-Constrained Diffusion Reconstruction with Posterior Correction for Quantitative and Fast PET Imaging

Add code
Aug 20, 2025
Figure 1 for Physics-Constrained Diffusion Reconstruction with Posterior Correction for Quantitative and Fast PET Imaging
Figure 2 for Physics-Constrained Diffusion Reconstruction with Posterior Correction for Quantitative and Fast PET Imaging
Figure 3 for Physics-Constrained Diffusion Reconstruction with Posterior Correction for Quantitative and Fast PET Imaging
Figure 4 for Physics-Constrained Diffusion Reconstruction with Posterior Correction for Quantitative and Fast PET Imaging
Viaarxiv icon

Kimi-VL Technical Report

Add code
Apr 10, 2025
Figure 1 for Kimi-VL Technical Report
Figure 2 for Kimi-VL Technical Report
Figure 3 for Kimi-VL Technical Report
Figure 4 for Kimi-VL Technical Report
Viaarxiv icon

Efficient Motion-Aware Video MLLM

Add code
Mar 17, 2025
Figure 1 for Efficient Motion-Aware Video MLLM
Figure 2 for Efficient Motion-Aware Video MLLM
Figure 3 for Efficient Motion-Aware Video MLLM
Figure 4 for Efficient Motion-Aware Video MLLM
Viaarxiv icon

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Add code
Mar 13, 2025
Viaarxiv icon

Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining

Add code
Oct 21, 2024
Figure 1 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Figure 2 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Figure 3 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Figure 4 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Viaarxiv icon