Picture for Yinan He

Yinan He

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Add code
Apr 10, 2025
Viaarxiv icon

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Add code
Mar 27, 2025
Viaarxiv icon

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling

Add code
Jan 21, 2025
Viaarxiv icon

DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency

Add code
Jan 17, 2025
Viaarxiv icon

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models

Add code
Jan 14, 2025
Figure 1 for Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Figure 2 for Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Figure 3 for Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Figure 4 for Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Viaarxiv icon

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Add code
Dec 31, 2024
Viaarxiv icon

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Add code
Dec 26, 2024
Viaarxiv icon

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Add code
Nov 20, 2024
Figure 1 for VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Figure 2 for VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Figure 3 for VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Figure 4 for VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Viaarxiv icon

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 13, 2024
Figure 1 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 12, 2024
Figure 1 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon