Picture for Yuqi Huo

Yuqi Huo

Baichuan-Omni-1.5 Technical Report

Add code
Jan 26, 2025
Viaarxiv icon

Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

Add code
Jan 03, 2025
Viaarxiv icon

Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining

Add code
Oct 21, 2024
Figure 1 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Figure 2 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Figure 3 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Figure 4 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Viaarxiv icon

Exploring the Design Space of Visual Context Representation in Video MLLMs

Add code
Oct 17, 2024
Figure 1 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Figure 2 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Figure 3 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Figure 4 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Viaarxiv icon

Baichuan-Omni Technical Report

Add code
Oct 11, 2024
Figure 1 for Baichuan-Omni Technical Report
Figure 2 for Baichuan-Omni Technical Report
Figure 3 for Baichuan-Omni Technical Report
Figure 4 for Baichuan-Omni Technical Report
Viaarxiv icon

Towards Event-oriented Long Video Understanding

Add code
Jun 20, 2024
Viaarxiv icon

Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

Add code
Jun 13, 2024
Figure 1 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Figure 2 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Figure 3 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Figure 4 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Viaarxiv icon

VDT: An Empirical Study on Video Diffusion with Transformers

Add code
May 22, 2023
Viaarxiv icon

UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

Add code
Feb 13, 2023
Viaarxiv icon

LGDN: Language-Guided Denoising Network for Video-Language Modeling

Add code
Oct 03, 2022
Figure 1 for LGDN: Language-Guided Denoising Network for Video-Language Modeling
Figure 2 for LGDN: Language-Guided Denoising Network for Video-Language Modeling
Figure 3 for LGDN: Language-Guided Denoising Network for Video-Language Modeling
Figure 4 for LGDN: Language-Guided Denoising Network for Video-Language Modeling
Viaarxiv icon