Picture for Yiyi Zhou

Yiyi Zhou

Grounded Chain-of-Thought for Multimodal Large Language Models

Add code
Mar 17, 2025
Viaarxiv icon

AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection

Add code
Feb 08, 2025
Viaarxiv icon

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray

Add code
Feb 07, 2025
Viaarxiv icon

What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph

Add code
Jan 04, 2025
Figure 1 for What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph
Figure 2 for What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph
Figure 3 for What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph
Figure 4 for What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph
Viaarxiv icon

SVFR: A Unified Framework for Generalized Video Face Restoration

Add code
Jan 03, 2025
Figure 1 for SVFR: A Unified Framework for Generalized Video Face Restoration
Figure 2 for SVFR: A Unified Framework for Generalized Video Face Restoration
Figure 3 for SVFR: A Unified Framework for Generalized Video Face Restoration
Figure 4 for SVFR: A Unified Framework for Generalized Video Face Restoration
Viaarxiv icon

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Add code
Dec 05, 2024
Figure 1 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 2 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 3 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 4 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Viaarxiv icon

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

Add code
Nov 29, 2024
Figure 1 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Figure 2 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Figure 3 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Figure 4 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Viaarxiv icon

$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

Add code
Oct 17, 2024
Figure 1 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 2 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 3 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 4 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Viaarxiv icon

Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models

Add code
Sep 16, 2024
Figure 1 for Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Figure 2 for Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Figure 3 for Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Figure 4 for Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Viaarxiv icon

Image Captioning via Dynamic Path Customization

Add code
Jun 01, 2024
Viaarxiv icon