Picture for Gen Luo

Gen Luo

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Add code
Dec 05, 2024
Figure 1 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 2 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 3 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 4 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Viaarxiv icon

RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation

Add code
Dec 03, 2024
Viaarxiv icon

ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

Add code
Dec 02, 2024
Viaarxiv icon

$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

Add code
Oct 17, 2024
Figure 1 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 2 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 3 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 4 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Viaarxiv icon

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Add code
Oct 10, 2024
Figure 1 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Figure 2 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Figure 3 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Figure 4 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Viaarxiv icon

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models

Add code
Jul 31, 2024
Viaarxiv icon

3D-GRES: Generalized 3D Referring Expression Segmentation

Add code
Jul 31, 2024
Figure 1 for 3D-GRES: Generalized 3D Referring Expression Segmentation
Figure 2 for 3D-GRES: Generalized 3D Referring Expression Segmentation
Figure 3 for 3D-GRES: Generalized 3D Referring Expression Segmentation
Figure 4 for 3D-GRES: Generalized 3D Referring Expression Segmentation
Viaarxiv icon

Deep Instruction Tuning for Segment Anything Model

Add code
Mar 31, 2024
Viaarxiv icon

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization

Add code
Mar 11, 2024
Viaarxiv icon

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

Add code
Mar 05, 2024
Figure 1 for Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Figure 2 for Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Figure 3 for Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Figure 4 for Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Viaarxiv icon