Picture for Wenhu Chen

Wenhu Chen

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Add code
Nov 11, 2024
Viaarxiv icon

Harnessing Webpage UIs for Text-Rich Visual Understanding

Add code
Oct 17, 2024
Figure 1 for Harnessing Webpage UIs for Text-Rich Visual Understanding
Figure 2 for Harnessing Webpage UIs for Text-Rich Visual Understanding
Figure 3 for Harnessing Webpage UIs for Text-Rich Visual Understanding
Figure 4 for Harnessing Webpage UIs for Text-Rich Visual Understanding
Viaarxiv icon

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Add code
Oct 14, 2024
Viaarxiv icon

T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design

Add code
Oct 08, 2024
Figure 1 for T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design
Figure 2 for T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design
Figure 3 for T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design
Figure 4 for T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design
Viaarxiv icon

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks

Add code
Oct 07, 2024
Figure 1 for VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Figure 2 for VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Figure 3 for VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Figure 4 for VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Viaarxiv icon

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Add code
Sep 04, 2024
Figure 1 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Figure 2 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Figure 3 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Figure 4 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Viaarxiv icon

Foundation Models for Music: A Survey

Add code
Aug 27, 2024
Figure 1 for Foundation Models for Music: A Survey
Figure 2 for Foundation Models for Music: A Survey
Figure 3 for Foundation Models for Music: A Survey
Figure 4 for Foundation Models for Music: A Survey
Viaarxiv icon

LongIns: A Challenging Long-context Instruction-based Exam for LLMs

Add code
Jun 26, 2024
Viaarxiv icon

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Add code
Jun 24, 2024
Figure 1 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Figure 2 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Figure 3 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Figure 4 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Viaarxiv icon

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

Add code
Jun 21, 2024
Viaarxiv icon