Picture for Gen Luo

Gen Luo

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Add code
Jan 14, 2025
Viaarxiv icon

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Add code
Dec 05, 2024
Figure 1 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 2 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 3 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 4 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Viaarxiv icon

RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation

Add code
Dec 03, 2024
Figure 1 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Figure 2 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Figure 3 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Figure 4 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Viaarxiv icon

ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

Add code
Dec 02, 2024
Figure 1 for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Figure 2 for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Figure 3 for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Figure 4 for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Viaarxiv icon

$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

Add code
Oct 17, 2024
Figure 1 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 2 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 3 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 4 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Viaarxiv icon

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Add code
Oct 10, 2024
Figure 1 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Figure 2 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Figure 3 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Figure 4 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Viaarxiv icon

3D-GRES: Generalized 3D Referring Expression Segmentation

Add code
Jul 31, 2024
Figure 1 for 3D-GRES: Generalized 3D Referring Expression Segmentation
Figure 2 for 3D-GRES: Generalized 3D Referring Expression Segmentation
Figure 3 for 3D-GRES: Generalized 3D Referring Expression Segmentation
Figure 4 for 3D-GRES: Generalized 3D Referring Expression Segmentation
Viaarxiv icon

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models

Add code
Jul 31, 2024
Viaarxiv icon

Deep Instruction Tuning for Segment Anything Model

Add code
Mar 31, 2024
Viaarxiv icon

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization

Add code
Mar 11, 2024
Figure 1 for Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization
Figure 2 for Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization
Figure 3 for Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization
Figure 4 for Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization
Viaarxiv icon