Picture for Yunhang Shen

Yunhang Shen

Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration

Add code
Dec 12, 2024
Viaarxiv icon

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Add code
Dec 05, 2024
Figure 1 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 2 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 3 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 4 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Viaarxiv icon

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification

Add code
Dec 03, 2024
Viaarxiv icon

T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs

Add code
Dec 02, 2024
Viaarxiv icon

Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment

Add code
Nov 13, 2024
Figure 1 for Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment
Figure 2 for Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment
Figure 3 for Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment
Figure 4 for Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment
Viaarxiv icon

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Add code
Aug 09, 2024
Figure 1 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 2 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 3 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 4 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Viaarxiv icon

HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

Add code
Jun 27, 2024
Viaarxiv icon

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

Add code
Jun 14, 2024
Viaarxiv icon

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Add code
May 31, 2024
Figure 1 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 2 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 3 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 4 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Viaarxiv icon

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Add code
Apr 24, 2024
Viaarxiv icon