Picture for Weidi Xie

Weidi Xie

Weaver: End-to-End Agentic System Training for Video Interleaved Reasoning

Add code
Feb 05, 2026
Viaarxiv icon

Revisiting Multi-Task Visual Representation Learning

Add code
Jan 20, 2026
Viaarxiv icon

SoccerMaster: A Vision Foundation Model for Soccer Understanding

Add code
Dec 11, 2025
Viaarxiv icon

Inferring Dynamic Physical Properties from Video Foundation Models

Add code
Oct 02, 2025
Viaarxiv icon

Universal Video Temporal Grounding with Generative Multi-modal Large Language Models

Add code
Jun 23, 2025
Viaarxiv icon

SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

Add code
May 22, 2025
Viaarxiv icon

Multi-Agent System for Comprehensive Soccer Understanding

Add code
May 06, 2025
Viaarxiv icon

ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification

Add code
Apr 29, 2025
Viaarxiv icon

Learning Streaming Video Representation via Multitask Training

Add code
Apr 28, 2025
Figure 1 for Learning Streaming Video Representation via Multitask Training
Figure 2 for Learning Streaming Video Representation via Multitask Training
Figure 3 for Learning Streaming Video Representation via Multitask Training
Figure 4 for Learning Streaming Video Representation via Multitask Training
Viaarxiv icon

EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos

Add code
Apr 16, 2025
Figure 1 for EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
Figure 2 for EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
Figure 3 for EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
Figure 4 for EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
Viaarxiv icon