Picture for Haiyang Xu

Haiyang Xu

Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration

Add code
Feb 25, 2025
Viaarxiv icon

PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

Add code
Feb 21, 2025
Viaarxiv icon

Qwen2.5-VL Technical Report

Add code
Feb 19, 2025
Viaarxiv icon

Megrez-Omni Technical Report

Add code
Feb 19, 2025
Viaarxiv icon

Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

Add code
Jan 20, 2025
Figure 1 for Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Figure 2 for Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Figure 3 for Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Figure 4 for Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Viaarxiv icon

SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization

Add code
Nov 17, 2024
Viaarxiv icon

SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing

Add code
Sep 16, 2024
Figure 1 for SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing
Figure 2 for SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing
Figure 3 for SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing
Figure 4 for SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing
Viaarxiv icon

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

Add code
Sep 05, 2024
Figure 1 for mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Figure 2 for mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Figure 3 for mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Figure 4 for mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Viaarxiv icon

MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model

Add code
Aug 26, 2024
Viaarxiv icon

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

Add code
Aug 09, 2024
Figure 1 for mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Figure 2 for mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Figure 3 for mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Figure 4 for mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Viaarxiv icon