Picture for Cong Wei

Cong Wei

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation

Add code
Dec 01, 2024
Viaarxiv icon

HyperSeg: Towards Universal Visual Segmentation with Large Language Model

Add code
Nov 26, 2024
Viaarxiv icon

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Add code
Nov 11, 2024
Figure 1 for OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Figure 2 for OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Figure 3 for OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Figure 4 for OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Viaarxiv icon

MANTIS: Interleaved Multi-Image Instruction Tuning

Add code
May 02, 2024
Viaarxiv icon

LaSagnA: Language-based Segmentation Assistant for Complex Queries

Add code
Apr 12, 2024
Viaarxiv icon

AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

Add code
Mar 22, 2024
Figure 1 for AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
Figure 2 for AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
Figure 3 for AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
Figure 4 for AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
Viaarxiv icon

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

Add code
Feb 06, 2024
Viaarxiv icon

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation

Add code
Dec 22, 2023
Figure 1 for VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
Figure 2 for VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
Figure 3 for VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
Figure 4 for VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
Viaarxiv icon

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

Add code
Nov 28, 2023
Viaarxiv icon

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Add code
Nov 27, 2023
Figure 1 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Figure 2 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Figure 3 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Figure 4 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Viaarxiv icon