Picture for Yiyi Zhou

Yiyi Zhou

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Add code
Dec 05, 2024
Figure 1 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 2 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 3 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 4 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Viaarxiv icon

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

Add code
Nov 29, 2024
Figure 1 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Figure 2 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Figure 3 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Figure 4 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Viaarxiv icon

$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

Add code
Oct 17, 2024
Figure 1 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 2 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 3 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 4 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Viaarxiv icon

Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models

Add code
Sep 16, 2024
Figure 1 for Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Figure 2 for Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Figure 3 for Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Figure 4 for Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Viaarxiv icon

Image Captioning via Dynamic Path Customization

Add code
Jun 01, 2024
Viaarxiv icon

Deep Instruction Tuning for Segment Anything Model

Add code
Mar 31, 2024
Viaarxiv icon

Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models

Add code
Mar 22, 2024
Viaarxiv icon

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization

Add code
Mar 11, 2024
Viaarxiv icon

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

Add code
Mar 05, 2024
Figure 1 for Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Figure 2 for Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Figure 3 for Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Figure 4 for Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Viaarxiv icon

Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks

Add code
Jan 23, 2024
Figure 1 for Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks
Figure 2 for Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks
Figure 3 for Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks
Figure 4 for Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks
Viaarxiv icon