Picture for Ran Xu

Ran Xu

ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models

Add code
Dec 09, 2024
Viaarxiv icon

BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions

Add code
Nov 12, 2024
Viaarxiv icon

SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains

Add code
Oct 23, 2024
Figure 1 for SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains
Figure 2 for SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains
Figure 3 for SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains
Figure 4 for SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains
Viaarxiv icon

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

Add code
Oct 21, 2024
Figure 1 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 2 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 3 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 4 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Viaarxiv icon

Trust but Verify: Programmatic VLM Evaluation in the Wild

Add code
Oct 17, 2024
Viaarxiv icon

xLAM: A Family of Large Action Models to Empower AI Agent Systems

Add code
Sep 05, 2024
Viaarxiv icon

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Add code
Aug 22, 2024
Figure 1 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 2 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 3 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 4 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Viaarxiv icon

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Add code
Aug 16, 2024
Figure 1 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 2 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 3 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 4 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Viaarxiv icon

Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation

Add code
Jul 22, 2024
Viaarxiv icon

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

Add code
Jun 17, 2024
Viaarxiv icon