Picture for Xu Sun

Xu Sun

PosIR: Position-Aware Heterogeneous Information Retrieval Benchmark

Add code
Jan 13, 2026
Viaarxiv icon

Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence

Add code
Oct 23, 2025
Viaarxiv icon

Spatial-Temporal Human-Object Interaction Detection

Add code
Aug 24, 2025
Viaarxiv icon

VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

Add code
May 29, 2025
Viaarxiv icon

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction

Add code
May 28, 2025
Viaarxiv icon

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Add code
Apr 24, 2025
Viaarxiv icon

UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?

Add code
Mar 13, 2025
Viaarxiv icon

Generative Frame Sampler for Long Video Understanding

Add code
Mar 12, 2025
Figure 1 for Generative Frame Sampler for Long Video Understanding
Figure 2 for Generative Frame Sampler for Long Video Understanding
Figure 3 for Generative Frame Sampler for Long Video Understanding
Figure 4 for Generative Frame Sampler for Long Video Understanding
Viaarxiv icon

Next Block Prediction: Video Generation via Semi-Autoregressive Modeling

Add code
Feb 12, 2025
Viaarxiv icon

VidTwin: Video VAE with Decoupled Structure and Dynamics

Add code
Dec 23, 2024
Viaarxiv icon