Picture for Sheng Xia

Sheng Xia

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

Add code
Jun 10, 2026
Viaarxiv icon

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

Add code
Jun 04, 2026
Viaarxiv icon

ViCuR: Visual Cues as Recoverable Privilege for Multimodal On-Policy Distillation

Add code
Jun 04, 2026
Viaarxiv icon

Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling

Add code
Oct 14, 2024
Figure 1 for Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling
Figure 2 for Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling
Figure 3 for Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling
Figure 4 for Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling
Viaarxiv icon