Picture for Ming Yan

Ming Yan

Real-Time Personalization for LLM-based Recommendation with Customized In-Context Learning

Add code
Oct 30, 2024
Viaarxiv icon

Double Banking on Knowledge: Customized Modulation and Prototypes for Multi-Modality Semi-supervised Medical Image Segmentation

Add code
Oct 23, 2024
Viaarxiv icon

SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing

Add code
Sep 16, 2024
Viaarxiv icon

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

Add code
Sep 05, 2024
Figure 1 for mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Figure 2 for mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Figure 3 for mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Figure 4 for mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Viaarxiv icon

MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model

Add code
Aug 26, 2024
Viaarxiv icon

ProFuser: Progressive Fusion of Large Language Models

Add code
Aug 09, 2024
Viaarxiv icon

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

Add code
Aug 09, 2024
Viaarxiv icon

MIBench: Evaluating Multimodal Large Language Models over Multiple Images

Add code
Jul 21, 2024
Viaarxiv icon

Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models

Add code
Jul 19, 2024
Viaarxiv icon

DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation

Add code
Jul 18, 2024
Viaarxiv icon