Picture for Can Qin

Can Qin

Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models

Add code
Mar 20, 2025
Viaarxiv icon

Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding

Add code
Feb 17, 2025
Viaarxiv icon

DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models

Add code
Nov 22, 2024
Viaarxiv icon

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

Add code
Oct 21, 2024
Figure 1 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 2 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 3 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 4 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Viaarxiv icon

Triple Point Masking

Add code
Sep 26, 2024
Viaarxiv icon

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Add code
Aug 22, 2024
Figure 1 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 2 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 3 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 4 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Viaarxiv icon

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Add code
Aug 16, 2024
Figure 1 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 2 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 3 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 4 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Viaarxiv icon

STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical

Add code
Jun 28, 2024
Viaarxiv icon

MuseumMaker: Continual Style Customization without Catastrophic Forgetting

Add code
Apr 29, 2024
Figure 1 for MuseumMaker: Continual Style Customization without Catastrophic Forgetting
Figure 2 for MuseumMaker: Continual Style Customization without Catastrophic Forgetting
Figure 3 for MuseumMaker: Continual Style Customization without Catastrophic Forgetting
Figure 4 for MuseumMaker: Continual Style Customization without Catastrophic Forgetting
Viaarxiv icon

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Add code
Mar 17, 2024
Figure 1 for SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
Figure 2 for SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
Figure 3 for SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
Figure 4 for SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
Viaarxiv icon