Picture for Zichen Wen

Zichen Wen

Kimi K2.5: Visual Agentic Intelligence

Add code
Feb 02, 2026
Viaarxiv icon

Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

Add code
Jan 27, 2026
Viaarxiv icon

D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning

Add code
Dec 22, 2025
Figure 1 for D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning
Figure 2 for D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning
Figure 3 for D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning
Figure 4 for D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning
Viaarxiv icon

IPCV: Information-Preserving Compression for MLLM Visual Encoders

Add code
Dec 21, 2025
Figure 1 for IPCV: Information-Preserving Compression for MLLM Visual Encoders
Figure 2 for IPCV: Information-Preserving Compression for MLLM Visual Encoders
Figure 3 for IPCV: Information-Preserving Compression for MLLM Visual Encoders
Figure 4 for IPCV: Information-Preserving Compression for MLLM Visual Encoders
Viaarxiv icon

DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM

Add code
Dec 11, 2025
Figure 1 for DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM
Figure 2 for DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM
Figure 3 for DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM
Figure 4 for DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM
Viaarxiv icon

OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation

Add code
Oct 30, 2025
Viaarxiv icon

AI for Service: Proactive Assistance with AI Glasses

Add code
Oct 16, 2025
Viaarxiv icon

ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution

Add code
Oct 14, 2025
Figure 1 for ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Figure 2 for ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Figure 3 for ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Figure 4 for ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Viaarxiv icon

AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs

Add code
Oct 08, 2025
Figure 1 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 2 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 3 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 4 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Viaarxiv icon

Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods

Add code
Oct 08, 2025
Viaarxiv icon