Picture for Haoyu Lu

Haoyu Lu

Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining

Add code
Oct 21, 2024
Viaarxiv icon

Exploring the Design Space of Visual Context Representation in Video MLLMs

Add code
Oct 17, 2024
Figure 1 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Figure 2 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Figure 3 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Figure 4 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Viaarxiv icon

Towards Event-oriented Long Video Understanding

Add code
Jun 20, 2024
Viaarxiv icon

Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

Add code
Jun 13, 2024
Figure 1 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Figure 2 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Figure 3 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Figure 4 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Viaarxiv icon

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Add code
Mar 11, 2024
Figure 1 for DeepSeek-VL: Towards Real-World Vision-Language Understanding
Figure 2 for DeepSeek-VL: Towards Real-World Vision-Language Understanding
Figure 3 for DeepSeek-VL: Towards Real-World Vision-Language Understanding
Figure 4 for DeepSeek-VL: Towards Real-World Vision-Language Understanding
Viaarxiv icon

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Add code
Jan 05, 2024
Figure 1 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 2 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 3 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 4 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Viaarxiv icon

speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition

Add code
May 30, 2023
Viaarxiv icon

VDT: An Empirical Study on Video Diffusion with Transformers

Add code
May 22, 2023
Viaarxiv icon

UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

Add code
Feb 13, 2023
Viaarxiv icon

Monolingual Recognizers Fusion for Code-switching Speech Recognition

Add code
Nov 02, 2022
Viaarxiv icon