Picture for Yaya Shi

Yaya Shi

MIBench: Evaluating Multimodal Large Language Models over Multiple Images

Add code
Jul 21, 2024
Viaarxiv icon

Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training

Add code
Mar 01, 2024
Viaarxiv icon

Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval

Add code
Feb 26, 2024
Viaarxiv icon

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

Add code
Nov 30, 2023
Viaarxiv icon

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

Add code
Jun 07, 2023
Viaarxiv icon

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

Add code
Apr 27, 2023
Figure 1 for mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Figure 2 for mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Figure 3 for mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Figure 4 for mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Viaarxiv icon

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

Add code
Feb 01, 2023
Viaarxiv icon

EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

Add code
Nov 17, 2021
Figure 1 for EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Figure 2 for EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Figure 3 for EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Figure 4 for EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Viaarxiv icon

A Simple and Strong Baseline for Universal Targeted Attacks on Siamese Visual Tracking

Add code
May 06, 2021
Figure 1 for A Simple and Strong Baseline for Universal Targeted Attacks on Siamese Visual Tracking
Figure 2 for A Simple and Strong Baseline for Universal Targeted Attacks on Siamese Visual Tracking
Figure 3 for A Simple and Strong Baseline for Universal Targeted Attacks on Siamese Visual Tracking
Figure 4 for A Simple and Strong Baseline for Universal Targeted Attacks on Siamese Visual Tracking
Viaarxiv icon

Object Relational Graph with Teacher-Recommended Learning for Video Captioning

Add code
Feb 26, 2020
Figure 1 for Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Figure 2 for Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Figure 3 for Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Figure 4 for Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Viaarxiv icon