Picture for Kaipeng Zhang

Kaipeng Zhang

ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality

Add code
Dec 05, 2024
Figure 1 for ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality
Figure 2 for ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality
Figure 3 for ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality
Figure 4 for ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality
Viaarxiv icon

GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Add code
Dec 01, 2024
Viaarxiv icon

TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts

Add code
Oct 23, 2024
Viaarxiv icon

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping

Add code
Oct 11, 2024
Figure 1 for Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Figure 2 for Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Figure 3 for Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Figure 4 for Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Viaarxiv icon

ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression

Add code
Oct 11, 2024
Viaarxiv icon

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Add code
Oct 07, 2024
Viaarxiv icon

HRVMamba: High-Resolution Visual State Space Model for Dense Prediction

Add code
Oct 04, 2024
Viaarxiv icon

T3M: Text Guided 3D Human Motion Synthesis from Speech

Add code
Aug 23, 2024
Figure 1 for T3M: Text Guided 3D Human Motion Synthesis from Speech
Figure 2 for T3M: Text Guided 3D Human Motion Synthesis from Speech
Figure 3 for T3M: Text Guided 3D Human Motion Synthesis from Speech
Figure 4 for T3M: Text Guided 3D Human Motion Synthesis from Speech
Viaarxiv icon

Prioritize Alignment in Dataset Distillation

Add code
Aug 06, 2024
Figure 1 for Prioritize Alignment in Dataset Distillation
Figure 2 for Prioritize Alignment in Dataset Distillation
Figure 3 for Prioritize Alignment in Dataset Distillation
Figure 4 for Prioritize Alignment in Dataset Distillation
Viaarxiv icon

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Add code
Aug 05, 2024
Figure 1 for MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Figure 2 for MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Figure 3 for MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Figure 4 for MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Viaarxiv icon