Picture for Yunhang Shen

Yunhang Shen

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery

Add code
Jan 09, 2025
Viaarxiv icon

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Add code
Jan 03, 2025
Viaarxiv icon

Probability-density-aware Semi-supervised Learning

Add code
Dec 23, 2024
Viaarxiv icon

Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration

Add code
Dec 12, 2024
Viaarxiv icon

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Add code
Dec 05, 2024
Figure 1 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 2 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 3 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 4 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Viaarxiv icon

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification

Add code
Dec 03, 2024
Figure 1 for Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
Figure 2 for Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
Figure 3 for Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
Figure 4 for Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
Viaarxiv icon

T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs

Add code
Dec 02, 2024
Figure 1 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Figure 2 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Figure 3 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Figure 4 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Viaarxiv icon

Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment

Add code
Nov 13, 2024
Figure 1 for Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment
Figure 2 for Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment
Figure 3 for Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment
Figure 4 for Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment
Viaarxiv icon

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Add code
Aug 09, 2024
Figure 1 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 2 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 3 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 4 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Viaarxiv icon

HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

Add code
Jun 27, 2024
Viaarxiv icon