Picture for Xiujun Li

Xiujun Li

Multimodal Autoregressive Pre-training of Large Vision Encoders

Add code
Nov 21, 2024
Figure 1 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Figure 2 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Figure 3 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Figure 4 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Viaarxiv icon

Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset

Add code
Nov 05, 2024
Figure 1 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Figure 2 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Figure 3 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Figure 4 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Viaarxiv icon

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Add code
Oct 24, 2024
Viaarxiv icon

From Text to Pixel: Advancing Long-Context Understanding in MLLMs

Add code
May 23, 2024
Viaarxiv icon

VIM: Probing Multimodal Large Language Models for Visual Embedded Instruction Following

Add code
Nov 29, 2023
Viaarxiv icon

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

Add code
May 18, 2023
Viaarxiv icon

Self-supervised Pre-training with Hard Examples Improves Visual Representations

Add code
Jan 04, 2021
Figure 1 for Self-supervised Pre-training with Hard Examples Improves Visual Representations
Figure 2 for Self-supervised Pre-training with Hard Examples Improves Visual Representations
Figure 3 for Self-supervised Pre-training with Hard Examples Improves Visual Representations
Figure 4 for Self-supervised Pre-training with Hard Examples Improves Visual Representations
Viaarxiv icon

VinVL: Making Visual Representations Matter in Vision-Language Models

Add code
Jan 02, 2021
Figure 1 for VinVL: Making Visual Representations Matter in Vision-Language Models
Figure 2 for VinVL: Making Visual Representations Matter in Vision-Language Models
Figure 3 for VinVL: Making Visual Representations Matter in Vision-Language Models
Figure 4 for VinVL: Making Visual Representations Matter in Vision-Language Models
Viaarxiv icon

MiniVLM: A Smaller and Faster Vision-Language Model

Add code
Dec 13, 2020
Figure 1 for MiniVLM: A Smaller and Faster Vision-Language Model
Figure 2 for MiniVLM: A Smaller and Faster Vision-Language Model
Figure 3 for MiniVLM: A Smaller and Faster Vision-Language Model
Figure 4 for MiniVLM: A Smaller and Faster Vision-Language Model
Viaarxiv icon

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

Add code
May 18, 2020
Figure 1 for Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Figure 2 for Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Figure 3 for Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Figure 4 for Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Viaarxiv icon