Picture for Manli Shu

Manli Shu

BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions

Add code
Nov 12, 2024
Viaarxiv icon

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

Add code
Oct 21, 2024
Viaarxiv icon

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Add code
Aug 22, 2024
Figure 1 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 2 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 3 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 4 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Viaarxiv icon

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Add code
Aug 16, 2024
Figure 1 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 2 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 3 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 4 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Viaarxiv icon

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

Add code
Jun 17, 2024
Viaarxiv icon

Coercing LLMs to do and reveal anything

Add code
Feb 21, 2024
Viaarxiv icon

Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models

Add code
Feb 05, 2024
Viaarxiv icon

Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks

Add code
Nov 20, 2023
Viaarxiv icon

On the Reliability of Watermarks for Large Language Models

Add code
Jun 30, 2023
Viaarxiv icon

Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

Add code
Jun 29, 2023
Viaarxiv icon