Picture for Runhui Huang

Runhui Huang

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

Add code
Dec 09, 2024
Viaarxiv icon

Efficient Multi-modal Large Language Models via Visual Token Grouping

Add code
Nov 26, 2024
Figure 1 for Efficient Multi-modal Large Language Models via Visual Token Grouping
Figure 2 for Efficient Multi-modal Large Language Models via Visual Token Grouping
Figure 3 for Efficient Multi-modal Large Language Models via Visual Token Grouping
Figure 4 for Efficient Multi-modal Large Language Models via Visual Token Grouping
Viaarxiv icon

AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning

Add code
Nov 18, 2024
Viaarxiv icon

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Add code
Sep 26, 2024
Figure 1 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 2 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 3 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 4 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Viaarxiv icon

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

Add code
Jul 11, 2024
Figure 1 for HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Figure 2 for HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Figure 3 for HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Figure 4 for HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Viaarxiv icon

LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model

Add code
Mar 18, 2024
Viaarxiv icon

GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training

Add code
Aug 22, 2023
Figure 1 for GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
Figure 2 for GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
Figure 3 for GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
Figure 4 for GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
Viaarxiv icon

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

Add code
Aug 18, 2023
Viaarxiv icon

UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning

Add code
Jun 01, 2023
Viaarxiv icon

Boosting Visual-Language Models by Exploiting Hard Samples

Add code
May 09, 2023
Viaarxiv icon