Picture for Zijia Zhao

Zijia Zhao

Kimi K2.5: Visual Agentic Intelligence

Add code
Feb 02, 2026
Viaarxiv icon

WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models

Add code
Jan 28, 2026
Viaarxiv icon

Kimi-VL Technical Report

Add code
Apr 10, 2025
Figure 1 for Kimi-VL Technical Report
Figure 2 for Kimi-VL Technical Report
Figure 3 for Kimi-VL Technical Report
Figure 4 for Kimi-VL Technical Report
Viaarxiv icon

Image Difference Grounding with Natural Language

Add code
Apr 02, 2025
Viaarxiv icon

Efficient Motion-Aware Video MLLM

Add code
Mar 17, 2025
Figure 1 for Efficient Motion-Aware Video MLLM
Figure 2 for Efficient Motion-Aware Video MLLM
Figure 3 for Efficient Motion-Aware Video MLLM
Figure 4 for Efficient Motion-Aware Video MLLM
Viaarxiv icon

ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval

Add code
Oct 24, 2024
Figure 1 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Figure 2 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Figure 3 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Figure 4 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Viaarxiv icon

Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining

Add code
Oct 21, 2024
Figure 1 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Figure 2 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Figure 3 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Figure 4 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Viaarxiv icon

Exploring the Design Space of Visual Context Representation in Video MLLMs

Add code
Oct 17, 2024
Figure 1 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Figure 2 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Figure 3 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Figure 4 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Viaarxiv icon

OneDiff: A Generalist Model for Image Difference

Add code
Jul 08, 2024
Figure 1 for OneDiff: A Generalist Model for Image Difference
Figure 2 for OneDiff: A Generalist Model for Image Difference
Figure 3 for OneDiff: A Generalist Model for Image Difference
Figure 4 for OneDiff: A Generalist Model for Image Difference
Viaarxiv icon

Towards Event-oriented Long Video Understanding

Add code
Jun 20, 2024
Figure 1 for Towards Event-oriented Long Video Understanding
Figure 2 for Towards Event-oriented Long Video Understanding
Figure 3 for Towards Event-oriented Long Video Understanding
Figure 4 for Towards Event-oriented Long Video Understanding
Viaarxiv icon