Picture for Can Qin

Can Qin

DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models

Add code
Nov 22, 2024
Viaarxiv icon

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

Add code
Oct 21, 2024
Figure 1 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 2 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 3 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 4 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Viaarxiv icon

Triple Point Masking

Add code
Sep 26, 2024
Viaarxiv icon

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Add code
Aug 22, 2024
Figure 1 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 2 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 3 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 4 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Viaarxiv icon

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Add code
Aug 16, 2024
Figure 1 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 2 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 3 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 4 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Viaarxiv icon

STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical

Add code
Jun 28, 2024
Viaarxiv icon

MuseumMaker: Continual Style Customization without Catastrophic Forgetting

Add code
Apr 29, 2024
Figure 1 for MuseumMaker: Continual Style Customization without Catastrophic Forgetting
Figure 2 for MuseumMaker: Continual Style Customization without Catastrophic Forgetting
Figure 3 for MuseumMaker: Continual Style Customization without Catastrophic Forgetting
Figure 4 for MuseumMaker: Continual Style Customization without Catastrophic Forgetting
Viaarxiv icon

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Add code
Mar 17, 2024
Figure 1 for SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
Figure 2 for SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
Figure 3 for SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
Figure 4 for SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
Viaarxiv icon

M3SOT: Multi-frame, Multi-field, Multi-space 3D Single Object Tracking

Add code
Dec 11, 2023
Viaarxiv icon

Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection

Add code
Aug 13, 2023
Viaarxiv icon