Picture for Haiyang Xu

Haiyang Xu

SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing

Add code
Sep 16, 2024
Viaarxiv icon

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

Add code
Sep 05, 2024
Figure 1 for mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Figure 2 for mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Figure 3 for mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Figure 4 for mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Viaarxiv icon

MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model

Add code
Aug 26, 2024
Viaarxiv icon

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

Add code
Aug 09, 2024
Viaarxiv icon

MIBench: Evaluating Multimodal Large Language Models over Multiple Images

Add code
Jul 21, 2024
Viaarxiv icon

OmniControlNet: Dual-stage Integration for Conditional Image Generation

Add code
Jun 09, 2024
Viaarxiv icon

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

Add code
Jun 03, 2024
Viaarxiv icon

TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

Add code
Apr 25, 2024
Viaarxiv icon

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Add code
Mar 19, 2024
Viaarxiv icon

Bayesian Diffusion Models for 3D Shape Reconstruction

Add code
Mar 11, 2024
Viaarxiv icon