Picture for Yueze Wang

Yueze Wang

MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos

Add code
Feb 18, 2025
Viaarxiv icon

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models

Add code
Feb 10, 2025
Viaarxiv icon

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Add code
Dec 19, 2024
Viaarxiv icon

Emu3: Next-Token Prediction is All You Need

Add code
Sep 27, 2024
Figure 1 for Emu3: Next-Token Prediction is All You Need
Figure 2 for Emu3: Next-Token Prediction is All You Need
Figure 3 for Emu3: Next-Token Prediction is All You Need
Figure 4 for Emu3: Next-Token Prediction is All You Need
Viaarxiv icon

OmniGen: Unified Image Generation

Add code
Sep 17, 2024
Figure 1 for OmniGen: Unified Image Generation
Figure 2 for OmniGen: Unified Image Generation
Figure 3 for OmniGen: Unified Image Generation
Figure 4 for OmniGen: Unified Image Generation
Viaarxiv icon

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Add code
Jul 11, 2024
Figure 1 for DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Figure 2 for DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Figure 3 for DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Figure 4 for DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Viaarxiv icon

Unveiling Encoder-Free Vision-Language Models

Add code
Jun 17, 2024
Viaarxiv icon

Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions

Add code
Jun 15, 2024
Viaarxiv icon

Efficient Multimodal Learning from Data-centric Perspective

Add code
Feb 18, 2024
Viaarxiv icon

Universal Prompt Optimizer for Safe Text-to-Image Generation

Add code
Feb 16, 2024
Viaarxiv icon