Picture for Tianxiang Jiang

Tianxiang Jiang

Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning

Add code
Jan 30, 2026
Viaarxiv icon

LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning

Add code
Jan 15, 2026
Viaarxiv icon

Make Your Training Flexible: Towards Deployment-Efficient Video Models

Add code
Mar 18, 2025
Viaarxiv icon

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

Add code
Oct 25, 2024
Figure 1 for TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Figure 2 for TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Figure 3 for TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Figure 4 for TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Viaarxiv icon

Dynamic Resolution Guidance for Facial Expression Recognition

Add code
Apr 09, 2024
Figure 1 for Dynamic Resolution Guidance for Facial Expression Recognition
Figure 2 for Dynamic Resolution Guidance for Facial Expression Recognition
Figure 3 for Dynamic Resolution Guidance for Facial Expression Recognition
Figure 4 for Dynamic Resolution Guidance for Facial Expression Recognition
Viaarxiv icon

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Add code
Mar 22, 2024
Viaarxiv icon